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Abstract 

We construct a density estimator in the bivariate uniform deconvolution model. For 
this model we derive four inversion formulas to express the bivariate density that we want 
to estimate in terms of the bivariate density of the observations. By substituting a kernel 
density estimator of the density of the observations we then get four different estimators. 
Next we construct an asymptotically optimal convex combination of these four estimators. 
Expansions for the bias, variance, as well as asymptotic normality, are derived. Some sim- 
ulated examples are presented. 
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1 Introduction 

Before focusing on bivariate deconvolution let us first consider univariate deconvolution . Let 
Xi, . . . , Xn be i.i.d. observations, where Xi = Yi + Zi and Yi and Zj are independent. Assume 
that the unobservable Yi have distribution function F and density /. Also assume that the 
unobservable random variables Zi have a known density k. If the Zi are uniformly distributed 
then we have a uniform deconvolution problem. Note that the density g of Xi is equal to the 
convolution of / and k, so g = k * f where * denotes convolution. So we have 

/oo 
k{x — u)f{u)du. (1) 
-oo 

The deconvolution problem is the problem of estimating f ot F from the observations X^. 
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Several generally applicable methods have been proposed for this deconvolution model. 
The standard Fourier type kernel density estimator for deconvolution problems is based on the 
Fourier transform, see for instance Wand and Jones (1995). Let w denote a kernel function and 
h > a. bandwidth. The estimator fnh{x) of the density / at the point x is defined as 

with 

vh{u) = ^ r -^f^ e—ds, and = -j^ ' 

the empirical characteristic function, and 0^ and (j)k denote the characteristic functions of w 
and k respectively. An important condition for these estimators to be properly defined is that 
the characteristic function 0^ of the density k has no zeroes, which renders it useless for uni- 
form deconvolution. In fact, Hu and Ridder (2004) argue that in economic applications this 
assumption is not reasonable since many distributions with a bounded support have charac- 
teristic functions with zeros on the real line. They propose an approximation of the Fourier 
transform estimator in such cases. For other modifications of the Fourier inversion method in 
this problem see Hall and Meister (2007),Feuerverger, Kim and Sun (2008), Meister (2008) and 
Delaigle and Meister (2011). 

In some univariate deconvolution problems one can apply nonparametric maximum like- 
lihood. In the uniform deconvolution problem for instance the error Z is Uniform[0, 1) dis- 
tributed. So in this particular deconvolution problem we assume to have i.i.d. observations 
from the density 

/CO nx 
— u)f{u)du = I f{u)du = F{x) — F{x—l). (3) 
-oo J x—1 

Groeneboom and Jongbloed (2003) consider density estimation in this problem. They propose a 
kernel density estimator based on the nonparametric maximum likelihood estimator (NPMLE) 
of the distribution function F and derive its asymptotic properties. For estimators of the 
distribution function in uniform deconvolution, related to the NPMLE, we refer to Groeneboom 
and Wellner (1992), Van Es and Van Zuijlen (1996) and Donauer, Groeneboom and Jongbloed 
(2009). 

A selected group of deconvolution problems allows explicit inversion formulas of ([T]) ex- 
pressing the density of interest / in terms of the density g of the data. In these cases we can 
estimate / by substituting for instance a direct kernel density estimate of g in the inversion 
formula. In Van Es and Kok (1998) this strategy has been pursued for deconvolution problems 
where k equals the exponential density, the Laplace density, and their repeated convolutions. 

If we apply inversion to the uniform problem then it turns out we get two obvious inversion 
formulas. Of course these inversions agree on the set of densities of the form (|3]), but they are 
different outside of this set. Plugging in a kernel estimator of the density g of the observations, 
which is typically not of this form, then yields two estimators of /. These can then in some 
sense be optimally combined in a convex combination. This approach is developed in Van Es 
(2011). Here we will follow this approach in the bivariate uniform deconvolution setting. 
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Let us now consider bivariate deconvolution. The bivariate convolution formula Xj = Yj + 
where 5( 
notation as 



Zj, where Xj, Yj and Zj stand for two dimensional random vectors, can be written in vector 



The estimation principles described above can in principle all be attempted in the bivariate 
problem as well. See for instance Youndje and Wells (2008) for recent results on multivariate 
Fourier type kernel deconvolution. Approaches based on nonparametric maximum likelihood 
and inversion hardly exist to our knowledge. 

In the bivariate uniform deconvolution setting the random vector Zj has a Uniform([0, 1) x 
[0, 1)) distribution, i.e. it is uniformly distributed on the unit square. Here we can also express 
the bivariate density g of the observations in terms of the bivariate distribution function F, 
with density /, of the random vector Y. We have 



g{Xi,X2)=l I I[Q,l){,Xi- Ui)I[Q^i){x2- U2)f{Ui,U2)dUidU2 

f{ui, U2)dUidU2 



-oo J — oo 
X2 fXi 



X2 — 1 J Xl~l 

= F{xi, X2) - F{xi,X2 - 1) - F{xi - 1, X2) + F{xi - 1, X2 - 1). (5) 

This is the bivariate analogue of formula ((31). Note that, again, the Fourier inversion approach 
can not be used because of the zeros in the characteristic function of the bivariate uniform 
distribution. 

Apart from being of theoretical interest, bivariate unform deconvolution is also of interest 
because of its relation to what one might call quadrant censoring or bivariate current status 
data, i.e. a bivariate version of univariate Type I interval censoring. This censoring problem 
can be described as follows. For convenience we restrict ourselves to the unit square. Consider 
n i.i.d random points Tj, i = 1, . . . , n, with Tj = (Tji, Tj2), in the unit square. Furthermore we 
have n i.i.d unobservable random points Xj, z = 1, . . . , n, with Xj = (Xji, Xj2), also in the unit 
square. For each i we observe whether Xj is in quadrant 1, 2, 3 or 4 relative to the known point 
Tj. Let us quantify these observations by the discrete random variable Aj. So we have 

1 , if Xii > Til and Xi2 > Tj2, 

. ^ ■ 2 , if Xji < Til and Xi2 > Tj2, 

^ 3 , if Xii < Til and Xj2 < 7^2, ^ ' 

4 , if Xii > Til and Xj2 < Tj2. 

This problem is related to uniform deconvolution by a tranformation of the data. Assume 
that the unobserved Xj have a bivariate density /. The statistical problem is to estimate this 
density from the observations (Ti, Aj), . . . , (T„, A^). 

Consider the following transformation of the points Tj, 



(Ta + l,r,2 + l) , if Aj = l, 

\r _ n/ ^/ \ — J iTii,Ti2 + 1) , if Aj = 2, , . 

- {Va,V,2) - i (y^^^y^^) if^_3^ (7) 

(Ta + l,r,2) , if A, = 4. 
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It can be shown that if the density / is concentrated on the unit square and if the observation 
points Tj are uniformly distributed on the unit square then the density of the random points 
Vj is identical to (jS]). This shows that a method for bivariate uniform deconvolution of the 
type developed here can also be used in quadrant censoring. 

The main aim of this paper is to develop the inversion approach of Van Es (2011) for 
bivariate uniform deconvolution. In Chapter |2] we derive four inversion formulas for ([5]). This 
yields the same number of possible estimators if we plug in a density estimator of the density g 
of the observations. In Chapter [3] we combine these estimators in a convex combination which 
is asymptotically optimal in some sense. The weights of this combination turn out to depend 
on the unknown distribution F. A general theorem for an estimator with estimated weights is 
given in Chapter HI We also present specific estimators of these weights. Simulated examples 
are presented in Chapter |5l Chapter [6] contains the proofs. 



2 Inversion formulas 

Recall that the density of the Zj is equal to k{zi^Z2) = /[o,i)x[o,i)(-2i? -22) = -^[o,i)(-2i)-^[o,i)(-22)- 
This yields formula (|5]) which expresses g{xi,X2) in terms of F{xi,X2). Lemma l2.ll below 
demonstrates that the converse is also feasible. 
First note that for 



F- 




y2) 


:=Pr(Fi<yi,r2<y2) 


F- 


Hvi 


y2) 


:=Pr(ri<yi,r2>y2) 


F+ 


'{yi 


y2) 


:=Pr(ri>yi,F2<y2) 


F+ 


Hvi 


z/2) 


:=Pr(ri>yi,F2>l/2) 



the following equalities hold 



F {xi,X2) = F{xi,X2), (8) 

F~+(X1,X2) =Fy,(xi)-F(xi,X2), (9) 
F+~(xi,X2) = Fy^{x2) - F(xi,X2), (10) 

F++(X1,X2) =F(xi,X2)-Fy,(Xi)-Fy,(x2) + l. (11) 

If we know F{xi^X2) and if this function is continuously differentiable over xi and X2, then 
we know /(xi,X2), because /(xi,X2) = g^^g^^ F(xi,X2). In fact, combined with the formulas 
above, and ([S]), this gives us four different inversion formulas to obtain / and F from g, as is 
stated in the following Lemma. 
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Lemma 2.1 We have 



oo oo 



F {xi,X2) = ^^gixi-i,X2- j), (12) 

i=0 j=0 



oo oo 



F ^{xi,X2) = ^^gixi-i,X2 + j), (13) 

i=0 j=l 



oo oo 



F+ {xi,X2) = ^^g{xi+i,X2- j), (14) 

i=l j=0 

oo oo 

F^+{xi,X2) = ^^gixi+i,X2+ j). (15) 

i=i j=i 

Assume that \imxj^^±oo f{xi,X2) = and \imx2^±oo f{xi,X2) = 0. Furthermore, assume that 
g{xi, X2) is twice mixed continuously differentiable overxi and X2- Then there are four inversion 
formulas to recover f from g. We have 

00 00 q2 

f(xi,X2) = J2J2Q^-d^9{xi-t,X2-j), (16) 

00 00 ^2 

/(a;i,X2) = -^^^^^^(7(xi -z,X2+j), (17) 

00 00 g2 

f{xu ^2) = - ^ ^ g{xi +i,X2- j), (18) 

i=l j=0 ^2 
00 00 ^2 

f{xi, X2) = J2Y1 dx dx + X2 + 3). (19) 

i=l j=l ^ 2 



To get some more insight in these inversion formulas note that ([5]) can be interpreted as a 
probabihty for Y (under F). We have 

g{xi,X2) = Pf(Y e {xi - l,xi] X {x2 - 1,0:2]). 

So g{xi,X2) is equal to the probability that Y belongs to a specific square (xi — 1, Xi] x (x2 — 1, X2]. 
Adding up over suitable squares we then get the probability that Y belongs to a specific 
quadrant with a given vertex. For a formal proof see Chapter [61 



3 Estimation of the density function 

In the previous chapter we have derived inversion formulas that express the density / in terms 
of the density g of the observations. Now we can use an estimator of g, for which we have 
observations, to estimate /. For an arbitrary density that is not of the form ([5]), the inversions 
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Figure 1: F++{x,, X2) = E"i E," 1 ^f(Y^ E {xi + t - 1, Xi + i] x {x2 + j - I, X2 + j]). 

will in general not yield distribution functions or densities, nor will they coincide. This typically 
happens if we estimate g. 

We use kernel smoothing but of course other estimators can be used as well. Let us introduce 
a bivariate kernel density estimator with bivariate kernel function w and bandwidth h > 0. 
The estimator gnh of g is given by 

/ N 1 fxi— Xki X2 — Xk2 \ /„„s 

9nn{x,^ X,) = — )^ w J • (20) 

k=l ^ ^ 

Usually, w is chosen to be a bivariate probability density function. This way it is ensured that 
gnh is also a density. See for instance Silverman (1986) and Wand and Jones (1995). 

We impose the following condition on the kernel function. 
Condition W 

The junction w is a probability density function on with support [—1, 1] x [—1, 1]. Fur- 
thermore, we will use a product kernel w(ui, U2) = Wi{ui)w2{u2) , where Wi{ui), with i G {1, 2], 
denotes a continuously differentiable univariate symmetric probability density function. 

We now substitute the kernel estimator in the four inversion formulas of Lemma 12.11 We 
derive the estimator fnh{x\.,X2) as follows. The other three estimators follow similarly. Define 
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1,2. Lemma [2.11 in combination with -q^^F{xi, X2) = /(xi,X2) gives 



00 00 q2 

fnhi^l. ^2) = ^Y^ a Q 9nh{xi +i,X2+ j) 
i=l j=l 1 2 

\ dxidx2 n ^ \ h ' /i 
i=i j=i ^ ^ ^ k=i ^ 



fc=l i=l j=l 



nh^ ^ ^ ^ \ h J \ h 



Note that, because of the bounded support of w, the sum is in fact a finite sum. In the last 

duidu2 



step we used the fact that w is a product kernel, and thus „ w(mi, M2) = w[{ui)w'2{u2) . 



The four kernel estimators of the density are given by 



h 

1 V^V^V^ ,/Xi-l-Xkl\ ,fX2+J-Xk2 



k=l i=0 j=0 
n 00 00 



k=l i=0 j=l 



k=l i=l j=0 

n 00 00 



k=i 1=1 j=i 

Next we introduce a convex combination of the four previous estimators. Write 

fnl{Xl,X2) = tif-,^{Xi,X2) +t2f-f^{Xi,X2) +hf;l^;{xi,X2) +Uf;lj^{Xi,X2), (21) 

where t = (ti,t2,t3, ^4) and ti+t2+^3+^4 = 1- For suitable choices of ti, ^2, ^3, this combination 
will turn out to have better properties than any of the estimators separately. Notice that when 
we set ti, t2, ts, or equal to one and the others equal to zero, we get results for f~h,f~f^, 
fnh^ or fnh individually. 

Theorem 3.1 Assume that Condition W is satisfied, that f is bounded, and that 
limx^^±oof{.Xi,X2) = li'mx^^±oof{xi,X2) =0. // / is twice continuously dijjerentiable on a 
neighborhood of x = {xi,X2) then, as n ^ 00, h ^ 0,nh ^ 00, we have 

E/lh(a;i,a;2) = /(a;i,X2) + -/i^( / z^wi{z)dzfii{xi,X2) + / W2{z)dz f 22{xi, X2)\ + o{h^). 

^ ^ J -00 J-oa ' 

(22) 

Furthermore, as n ^ 00, h ^ 0,nh 00, we have 

VaT{f^2{xi,X2)) = ^B{xi,X2,ti,t2,t3,t4,) j w[{zf dz j w'^izfdz + o{rr^h'^) (23) 
where 

B{xi, X2, ti, t2, ts, ti) = {tlF~~ + tlF-+ + tlF+- + t^F++)(xi, X2). (24) 
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In the proof of the theorem we will see that the expectation of (xi, 0:2) is the same whatever 
convex combination we choose for. Lemma 13.21 gives the weights that minimize the leading 
term in the variance (l23l) . 

Lemma 3.2 Assume that {xi,X2) is an interior point of the support of f . The weights ti, t2, 
and ti, with ti + 12 + ^3 + ^4 = 1; that minimize the leading term in the variance [23^] . are 
denoted by ii{xi,X2), ^2(2^1, 2^2); is{xi,X2) andii{xi,X2) and they are equal to 





^X\. 


X2) 


= 


-,++, 




X2)A{xi, 


X2) 


i2\ 


[Xl: 


X2) 


= 


-,++, 


^X\. 


.X2)A{xi, 


X2) 


h\ 


[Xl: 


X2) 


= F—^- 


f,++, 


^X\. 


X2)A(xi, 


X2) 


t4l 




X2) 


= P—- 


f,+-, 


[Xl: 


^X2)A{xi, 


X2) 



The resulting variance of this optimal convex combination is then equal to 

yar{fnh{xi,X2)) = A{xi,X2)C{xi,X2)^j w[{zfdz J w'^izfdz + oin'^h'^), (25) 

Here 

A{xi,X2) := + F— '+-'++ + + F— X2). (26) 

where, for ai, 02, 61, 62, Ci, C2 G {-, +}, 

pa,a2Mb2,c^C2^^^^^^-^ ._ F'''^^{xi,X2)F'''''^{xi,X2)F^"'^{xi,X2), (27) 

and 

C{xi,X2) := F--(xi, X2)F-+(xi, X2)F+-(xi, X2)F++(xi, X2). (28) 

Proof 

First note that the weights are well defined since the fact that (xi, X2) is an interior point of 
the support of / implies that F {xi,X2), F~~^{xi,X2), F~^~{xi,X2) and F++(xi,X2) are strictly 
positive. The lower bound now follows from Lemma 16.21 in Chapter [61 □ 

Note that in general, of course, we do not know F. However, in Section H] we show that we 
can estimate F (xi, X2), F~^{xi,X2), F+"(a;i, X2), and F~^~^{xi,X2), again using the inversion 
formulas of Theorem 12. 1[ This will lead to estimates of the optimal weights. We then prove that 
the estimator with estimated weights shares the properties of Theorem 13.11 with the optimal 
weights. 

4 The final estimator with estimated optimal weights 

Let us write in{xi,X2) = (t„i(xi, X2), . . . , tri4(xi, X2)) for a vector of estimated weights. The 
next theorem shows that under some conditions on these estimators the limit behaviour of 
fnh \xi,X2) resembles the optimal limit behaviour of the estimator fj^{xi,X2)- 
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Theorem 4.1 Assume that Condition W is satisfied, that f is bounded, and that 
Assume for i = 1, . . . ,A, 



E {tniixi, X2) - ti{xi,X2)y = o{nh^''). (29) 

/// is twice continuously differentiable on a neighborhood of x = {xi,X2) then, as n ^ 00, h ^ 
0,nh — 7- 00, we have 

^fnh\^1^^2)=f{Xl,X2) + -h'^( / z'^Wi{z)dzfii{xi,X2)+ z'^W2{z)dz f22{xi, X2)] + o{h'^) ■ 

^ ^ J -00 J -00 ^ 

(30) 

Assume for i = 1, . . . , 4, 

E {ini{Xi,X2) - ti{xi,X2))^ = 0(1). (31) 

Then, as n ^ oo, h ^ 0,nh ^ oo, we have 

Var(/il"^(xi,X2)) = ^a{xi,X2y + o{n-'h-^), (32) 
where, with the notation of Lemma \3.Si a{xi,X2Y is defined by 



a{xi,X2) = A{xi,X2)C{xi,X2) J w[{zi) dzi J W2{z2) dz2. (33) 
Assume for i = 1, . . . ,A, 

E (4i(xi, xa) - ti{xi,X2)Y = o(l). (34) 

Then the estimator is asymptotically normally distributed. We have, ,as n — )■ oo, /i — )■ 0,nh 
oo, 

V^h' {fnk\^u X2) - ^ft\xx.X2)) 4 iV(0, a(xi, X2f). (35) 

Let us next construct suitable estimators of the weights based on the estimators of F , -F""*", 
and F^^ . As in estimation of the density we can plug in ( l20l) into the inversion formu- 
las for F in Lemma [2.11 and get kernel estimators of F (xi, X2), -^"'''(xi, X2), -F'''~(xi, X2) and 
F"'""'"(a;i, X2). We get four estimators, given by 



n 00 00 



Fnh (^l'^2) = -^X^^J^Wi I \W2 



k=l i=0 j=0 



n 00 00 



F„/(xi, X2) = 4^ ^1 1 — ' — I ^2 



k=l i=0 j=l 



n 00 00 



Kh i^u ^2) = Z Z I ^ ' ^2 



k=l i=l j=0 



n 00 00 



Xl 


— i — 


Xki 




h 




Xl 


— i — 


Xki 




h 




Xl 


+ i- 


Xki 




h 




Xl 


+ i- 


Xki 



T?++( \ ^ ^^Sr Xi+i-Xkl \ f X2+J - Xk2 \ .„„v 

k=l i=l j=l 



X2 




Xk2 




h 




X2 


+ J- 


Xk2 




h 




X2 


- J - 


Xk2 




h 




X2 


+ J- 


Xk2 
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The following theorem establishes the asymptotic bias and variance of these four estimators. 
In the sequel we adopt the notation = ^ ^ at^^'^'^^ ^i2~ ~ ^ 8x18x2^^ ' ^^^-^ 
density /. The proof is very similar to the proof of Theorem 13.11 and is therefore omitted. See 
Benesova et al. (2011) for a complete proof. 

Theorem 4.2 Assume that Condition W is satisfied. Then, as n —)■ 00, /i —t- 0, ra/i cxd we 
have 

EF~f^{xi,X2) = F (xi,X2) + ^/i^C / z'^wi{z)dzF{'^'~~{xi,X2) + / z'^W2{z)dzF22~{xi,X2)] + o{h'^), 

EF^^+(xi,X2) = F"+(xi,X2) + ^/i^C / z^wx{z)dzF^-^{xx,X2) ^ f z'^W2{z)dzF22'^{xi,X2)] + o{h'^), 

^ ^ J -00 J -00 ' 

EF+;-(xi,X2) = F+-(xi,X2) + y z^wx{z)dzFt^-{xx,X2) + y z^W2{z)dzF^2-{xx,X2)) +o(/i2), 
E^„V(^i>a;2) = F++(xi,X2) + / z^wx{z)dzFt^{xx,X2) + / 2;2«;2(2;)c?2i^2t^(a;i,X2)) ^ o{l?). 



For the variances we have 



1 r' /I 



Var (F^^ (xi,X2)) = F (xi,X2)^^/ w^{z)dz W2iz)dz + o 



nh^ J_i J_i \nh'^ 

1 /I 



Var (F /(xi,X2)) = F +(xi,X2)— / wf{z)dz / W2(2;)c/2 + o. 



Var X2)) = ^2)^ ^ ^ 

Var (F++(a;i,a;2)) =F++(xi,X2)^ j ^wl{z)dz j ^ 



wl{z)dz + o 
wl{z)dz + 



1 



1 



For the proof of this theorem see Chapter El 



Next we write the optimal weights of Lemma 13.21 in terms of functions ti defined by 

ti{Xi,X2) = ti(F""(xi,X2),F"+(xi,X2),F+"(xi,X2),F++(xi,X2)), i = 1, . . . ,4. 

Let (e„) denote a sequence of numbers with < e„ < 1 and e„ — as n — )• 00. Then 
define truncated versions of the estimators X2), -^^'^(xi, X2), X2), X2) 

and F+^+(xi,a;2) by 

Fni^{xi,X2) = min(max(F„Y(a;i,X2),e„), 1), 
Fni^{xi,X2) = mm{m&x{F~+{xi, X2), e„), 1), 
Khi^i^X2) = min(max(F+^"(xi,X2),e„), 1), 
-^nA^(^i5^2) = min(max(F+^+(xi,X2),e„), 1). 
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Since the bandwidth used in the estimators of the weights can in general be different to the 
bandwidth h used in the estimator of /, we will denote this bandwidth by h. We now obtain 
estimators of the weights by plugging in these estimators. We get 



im{xi,X2) =U{F^-^ {xi,X2),F^-+{xi,X2),F^^ (xi,X2)),Ft+(a;i,X2)), z = l,...,4. 

The next lemma shows that these estimators, with a suitable bandwidth, can be used to estimate 
the optimal weights without disturbing the asymptotics of Theorem 13.11 

Lemma 4.3 If h ^ rr^l'° , e„ = 1/logn, and if we use a bandwidth h of the form h = cn~^^^, 
where c is a constant, then the estimators 

Liixi, X2) = X2), F'-j^ixi, X2), F^j^ixi, X2), F^j^ixi, X2)) 

satisfy (igj, ^ and 

Remark 4.4 // we compare the performance of our final estimator with estimated optimal 
weights to the performance of the four individual estimators then we see that the first order of 
the expectation is the same. The variance of the combined estimator contains the term C{xi, X2) 
which is equal to the product of F {xi,X2), F~~^{xi,X2), F~^~{xi,X2) and F~^~^{xi,X2)- This 
shows that the variance is small along the edge of the support of f . By Theorem \3.1\ the 
variance of, for instance, f^f^ {xi,X2) is proportional to F (xi,X2). So this estimator will 
perform better in the lower left of the support of f than it will in the other part. By using the 
estimated optimal convex combination the worse behavior of the four individual estimators in 
certain areas is reduced. 



Remark 4.5 Since in the theorems we use a bivariate kernel function w which is the product 
of two different univariate density functions wi and W2, in fact we allow different bandwidths 
for the two coordinates, provided the bandwidths are of the same order. Writing hi = h,h2 = 
ch,Wi = w and W2 = w{-/c)/c, for some c> Q, and writing fnhih2 f^"^ resulting estimator, 
we get the following leading terms in the expansions of its bias and variance in Theorem \3.1\ 

z'^w{z)dz(^hlfn{,xi, X2) + hjf 22{xi,X2)^ (37) 



and 

nh^h 



r^j^B{xi, X2,ti,t2,t3,t4)(^ J 



w'izydzj . (38) 
The subsequent theorems can be likewise adapted to different bandwidths. 



Remark 4.6 If we minimize the pointwise asymptotic mean squared error of f^^\xi,X2) and 
thus balance its asymptotic squared bias and its asymptotic variance given by Theorem \4.1\ then 
we see that the optimal bandwidth is of order n~^^^^ . The corresponding mean squared error is 
then equal to n~^/^. This of course raises the problem of bandwidth selection which, important 
though as it is for applications, we will not pursue here. 
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Remark 4.7 In the proofs we see that the bias of our final estimator is asymptotically of the 
same form as the bias of a bivariate kernel density estimator based on direct observations. That 
means that, if the smoothness assumptions on the density f are strengthened, bias reduction 
techniques, such as for instance higher order kernels or even super kernels, can be used to 
increase the rate of convergence. 

Remark 4.8 The construction as presented here for bivariate data can in principle also be 
done for arbitrary dimension d. For dimension one we have to combine two inversion formulas 
as shown in Van Es (2011). In the present paper, for dimension two, we combine four inversion 
formulas, and for arbitrary dimension d combination of 1'^ inversions has to be accomplished. 
Of course the complexity of the estimator will increase rapidly with growing dimension. 

5 Simulated examples 

To illustrate the estimator we have simulated two examples. In the first example the density / is 
unimodal. In the second example / is a mixture of two unimodal bivariate densities, rendering 
it bimodal. In the first example / is concentrated on the square [0.25,1.75] x [0.25,1.75]. In 
the second example / is concentrated on the square [0.2, 1.8] x [0.2, 1.8]. This means that both 
deconvolution problems are not at all trivial. 

To speed up computations we have followed the bivariate binning technique as advised in 
Wand (1994). For the x and y coordinates we have chosen for a grid of 500 points between -1 
and 4. We have used a product kernel based on the so called biweight kernel given by 



Example 5.1 In our first example / is the density of the random vector (Yi, Y2), where Yi and 
Y2 are two independent random variables that each have a certain shifted and rescaled beta 
distribution. To be more specific Yi = 0.25 + 1.5Vi, i = 1,2, where the Vi are independent and 
both Beta(3,3) distributed. We have simulated 1000 values so n = 1000. The bandwidth h, 
chosen by hand, is equal to 0.5. 

The true density / and its estimate are given in Figure [2] . The difference between the true 
density and the estimate is plotted in Figure [31 The right plot in Figure [3] shows f^/^. Clearly 
this estimate is best in the H — quadrant, as predicted by the theory. 

Example 5.2 In our second example / is the density of the random vector (Yi, Y2), where Yi 
and Y2 are dependent random variables with a bimodal distribution. The distribution of the 
vector is a mixture of two distributions like the one in Example 15.11 The values of the F's are 
generated as follows. With Vi and V2 having the same distribution as in the previous example 
the Y values are given by 



wi{u) = W2{u) 
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(i-^Yihi.nW- 



(39) 
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with probability 2/5, 



with probability 3/5. 
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Figure 2: Left: the true density. Right: the estimate. 



We have simulated 5000 values so n = 5000. The bandwidth h, chosen by hand, is equal to 
0.35. 

The true density / and its estimate are given in Figure H] . The difference between the true 
density and the estimate is plotted in Figure [51 The right plot in Figure [5] shows f'jl^. Clearly 
this estimate is best in the — h quadrant, as predicted by the theory. 

6 Proofs 

6.1 Proof of Lemma 12JJ 

Let us first derive the inversion formulas for F(xi,X2). We sum g{xi — i,X2) = F{xi — i,X2) — 
F{xi — i,X2 — I) — F{xi — i — 1, X2) + F{xi — i — 1, X2 — 1) over the first coordinate to obtain 
two telescopic sums. Thus we get 

00 

^^(xi - i,X2) 

00 

= '^{F{xi - i, X2) - F{xi - i, X2 - 1) - F{xi - i - 1, X2) + F{xi - i- l,X2-l)} 

i=0 

00 00 
= '^{F{xi - i, X2) - F{xi - i - l,X2)} - ^{F(a;i - i, X2 - 1) - F{xi - z - 1, X2 - 1)} 

't=0 i=0 

= F(xi,X2)-F(xi,X2-l). (40) 

Here we used that limj^oo -^(a^i — i,X2) = limj__^oo -^(a^i — hX2 — = 0, for F is a bivari- 
ate distribution function. Next, we sum over the second coordinate. Because we also have 
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Figure 3: Left: the difference of the true density and the estimate. Right: /^^ . 
hnij^oo F{xi, X2 - j) = 0, we get 

oo oo oo 

^^9ixi - i,X2 - j) = ^{F{xi,X2 - j) - F{xi,X2 - j - 1)} = F{xi,X2). (41) 

j=0 i=0 j=0 

Because the terms are nonnegative, the order of summation can be interchanged and we have 
shown (fT2|) . Thus we have found an expression for the unobservable probabihty distribution 
function F in terms of the observable density function g. 

Above, we iterated over —i, so now let us determine what happens when we iterate over +i. 
First, we write g{xi + i,X2) as 

g{xi + i, X2) = F{xi + 2, X2) - F{xi + i, X2 - 1) - F{xi + i - 1, X2) + F{xi + i - 1, 0:2 - 1). (42) 

Secondly, we take the sum over the first coordinate. Again we get two telescopic sums. Note 
that limj_^oo F{xi + i, X2) = Fy2{x2) and limj^oo F{xi + i,X2 — I) = Fy2{x2 — 1), so we get 

00 

^g{xi + i,X2) 

i=l 

00 

= '^{F{xi + i, X2) - F{xi +i,X2-l) - F{xi + i - 1, 0:2) + F{xi + z - 1, X2 - 1)} 

i=l 

00 00 
= ^{F{xi + 2, X2) - F{xi + i - 1, X2)} + ^{F{xi +i-l,X2-l) - F{xi + i, X2 - 1)} 

i=l i=l 

= Fy,{x2) - F{X,,X2) + F{xi,X2 - 1) - Fy,{x2 - 1). (43) 
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Figure 4: n = 5000, h = 0.35. Left: the true density. Right: the estimate. 

Thirdly, we sum over the second coordinate. Because hmj_^oo -^^2(^2 — j) = 0, this resuhs in 

00 00 

j=0 i=l 

00 

= ^{PY2ix2 - j) - F{XI,X2 - j) + F{XI,X2 - j - 1) - Fy^{x2 - j - 1)} 
j=0 

oo oo 
= ^{FY2{x2-j) - FY,{x2-j - 1)} - ^{F{xi,X2-j) - F(xi,X2 - J - 1)} 

= FY,{x2)-Fixi,X2) = F+-(xi,X2). (44) 
Again, we can interchange the sums and we have shown fll4p . In similar fashion we can derive 

The last formula to recover F{xi,X2) can be derived as follows. We begin with 

g{xi + 1, X2 + 1) = F{xi,X2) - F{xi, Xa + 1) - F{xi + 1, X2) + F{xi + 1, 0:2 + 1). (45) 

Now sum over the first coordinate to obtain 
00 

J2 9{XI + i, X2 + 1) = F(xi, X2) - F(xi, X2 + 1) - Fy2{x2) + Fy,(x2 + 1). (46) 
i=l 

Summing over the second coordinate we get 

00 00 

^^g{xi + i,X2+j) = F{xuX2) - FyAxi) - Fy,{x2) + 1 = F++ (0:1,0:2). (47) 
j=i i=i 

Changing the order of summation again, we obtain fll5p . 
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Figure 5: Left: the difference of the true density and the estimate. Right: /, 



nh ' 



The four inversion formulas for / are derived in a similar fashion. From we have 
g{xi, X2) = /(xi, X2) - f{xi, X2-I) - f{xi - 1, X2) + f{xi - 1, X2 - 1). 



dxidx2 

Now, following equations ( l40l) and (1411) . we obtain 

00 00 02 



EE 

i=0 j=0 



dxidx2 



g{xi - i,X2 - j) = f{xi,X2). 



Here we have used lim^j_j._oo f{xi,X2) = and limj.2_5._00 f{xi,X2) = 0. 
The other three inversion formulas follow similarly. 



6.2 Proof of Theorem [331 



First we consider the estimator f^jl^. We have 



' nh 

n 00 00 



h 



^ „++ . X -n. / 1 , (Xi+i - Xki \ , (X2+ j - Xk2 

k=l i=l j=l 

1 ^ ^ , f xi+i- Xu\ , [x2+j -Xi2 



i=l j=l 



00 00 



EE 



h 



Wn 



h 



00 POO 



, fXi + I - Ui\ , (X2+ ] -U2\ , , 

I 1 ]g[ui,U2)auidu2. 



h 



h 



i=l j=l --00 ^-00 
Note that interchanging integrals and sums is allowed because 



00 00 



i=i j=i 



00 roc 



00 >/ —00 



, fxi + i - ui 
h 



, fX2+j - U2 
^4 h 



g{ui,U2)duidu2 < 00. 
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To check this, we first make the substitutions vi :— ui — i and V2 ■— U2 — j- Secondly, we 
interchange the sTims and integrals again, which is allowed because the integrand is nonnegative 
(Fubini). We get 



X2 - V2 



^5s/:/:KmiK( . 

,fXi-Vi\ ,( 



^^g{vi + i,V2 + 3)dvidv2 



1 



OO CXD 



■oo J — OO 



'X2 - V2 
h 



^\^^g{vi + i,V2+ j)dvidv2. (51) 



1=1 j=i 



Thirdly, noting that F++{vi,V2) = Ylili 9{''^i + h V2+j)dvidv2 and that V2) < 1, 

we obtain 



J -00 J- 



w 



, (Xi - Vi 



1 

< — 



00 J —00 
00 poo 



h 

r r ,( xi-vi \ ,( X2-V2 \ 



)ih( 



X2 - V2 



h 

X2 - V2 



F^^{vi,V2)dvidv2 



dvidv2 < 00. 



(52) 



Because w[ and W2 are bounded functions, and have bounded support, this integral is finite. 
Thus our use of Fubini's Theorem is justified. Next we apply partial integration twice, yielding 



^fnhi^i^^2) ^j^YlYlj '^''^i— — y J ^'^{~ — ^j^-^^g{ui,U2)duijdu2 

00 00 oOO I - / /*0O I • \ 

= E E y_ -1 r r^"' ) ( y_ r r^^' ) a^^^^^' ^^^^^^ j 

= )^ E E y_ y_ -1 ( — ^ — j ( — ^ — j d^/^^^ ^^^du,du2. 

By the substitutions vi :— ui — i and V2 '■— U2 — j we get 

1 °^ °° f°° f°° / X — V \ ( X — V \ 

= E E y_ y_ -1 + + ^■)^^^^^- (5 
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Now we need to interchange integrals and sums again. Therefore, rewrite the equation above 
as 

oo oo 



2 — 1 j — 1 

lim lim / / \ j^af \ ^ gf(i;i + t;2 + jOc^t^irft;: 

h^oo Nh^oo ^ J _^ J _^ \ h / \ h / OV1OV2 



Ml— >oo M2— ^-oo 



-00 J -00 \ h J \ h J ^ ^ dv\OV2 



Ml Ml r^2 

EE 

i=l j=l 



(54) 

By (02]) we have ^(t;i+z,t;2) = t;2)-F(t;i+i, i;2-l)-F(i;i+z-l, t;2)+i^(t'i+^-l, t'2-1), 



so 



g{vi + i, V2) = f{vi + i, V2) - f{vi + i,V2-l)- f{vi + z - 1, ^2) + f{vi +i-l,V2-l). 



dvidv2 

Following the summation of (H3|l . we find 



d' 

E7w);;:^(^i + ^'^2) 

A/- 



dvidv2' 

■h 

^{fi'^i + h V2) - fivi +i,V2-l)- fivi + i-l,V2) + fivi + i-l,V2-l)} 



i=l 

Ml Ml 

= Y.^f{vi + V2) - f{Vi +t-l,V2)} + +t-l,V2-l)- f{Vi +t,V2-l)} 

i=l i=l 

= /K + M^,V2) - f{Vi,V2) - f{Vi,V2 - 1) - /K + M^,V2 - 1) (55) 

and 

M2 Ml q2 

EE^^^(-^+^'-2+^') 

Ml 

= Yl^fi^l + M,,V2+ j) - fiVi,V2 + j) - fiVu V2+J-I)- fiv, + Ml, V2+J~l)} 

i=i 

A/2 A/2 

= E^^(^i + M^,V2+ 3) - f{Vi + Ml, V2+3-l) + Y^{f{vi, V2+j-l)~ f{v^, V2 + j)} 

= fivi + M^,V2 + M2) - f{vi + M^,V2) + fiyi, V2) - f{yi,V2 + M2). (56) 

Note that this sum is finite for all ^1,^2, because / is bounded. Also note that changing the 
order of summation is allowed, because Mi,M2 < 00. By Lemma [2. II we have 

Ml M2 q2 



Mi->-oo M2->-oo ^-^ ^-^ dV\dv2 
1=1 i=l 
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We have assumed that / is bounded, so let /(fi,f2) < \A for all fi,f2, where A > is a 
constant. Observe the following inequality 

\fiVi + Mi,V2 + M2) - fiVi + M^,V2) + fivi,V2) - fiVi,V2 + M2)\ 

< l/K + M,,V2 + M2)\ + l/K + M,,V2)\ + \fiv,,V2)\ + \fiv,,V2 + Ms)! 

< A, (58) 

for all Vi,V2,Mi, and M2. Note that, because Wi and W2 are nonnegative, bounded and have 
bounded support, 

_^/_^^.i(^^)^2(i^)rfM.2 < 00 (59) 

for all Xi,X2. Thus we can apply the Lebesgue Dominated Convergenge Theorem to (|53|) . and 
find 



■00 i-00 \ h J \ h J ^ ^ dvidv2 

I — i J — i 

I ^1(^17^)^2(^-7-^) ,lim ^ ^ g{vi + i,V2 + 3)dvidv2 

00 i-00 \ h J \ h / Mi^oo M2^oo dViOV2 



■-00^-00 .. . - j^l 

00 /"OO 



00 ^ -00 ~ h / \ h 



Summarizing we now have 

^fnhi^l^^^) = ^l I ^ ^^ ^W2( J'^ ^^ ^f{Vl,V2)dVidV2. (61) 

Substituting zi := and Z2 := we get 

/•oo /"OO 

^fnh{^li^2)= I I Wi{Zi)W2{z2)f{Xi- hZi,X2- hZ2)dZidZ2. (62) 



oo ^ — oo 



Using the multivariate version of Taylor's theorem derived in Wand and Jones (1995) for this 
particular application, allows us to rewrite 

f{Xi - hZi,X2 - hZ2) = fiXi,X2) - h{Zifi + Z2f2){Xl,X2) 

+ ^h\zlfu + Z1Z2U12 + /21) + zlf22){Xi,X2) + 0(/l2). 
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We now obtain 

/CO POO 
/ Wi{Zi)W2{z2) (f{Xi, X2) - h{Zifi + Z2f2){Xi, X2) 
-00 J ~oo 

+ ^^^(^1/11 + 2;i2;2(/i2 + /21) + 2^2/22) (3^1,3:2) + o{h^)^dzidz2 

/oo />oo 
/ ZiWi{Zi)w2{z2)dZidZ2 
-00 J —00 

00 /-oo 



-hf2{xi,x2) / 2:2^1 (2:1)^2 (2;2)c?2;ic?2;2 

+ 7^h'^fn{xi,X2) / zfwi{zi)w2{z2)dzidz2 

^ J -00 J -co 

00 /"OO 



+ ^^^(/i2 + /2i)(a;i,X2) / / 2;iZ2Wi(2;i)u^2(2;2)c?2;i(i2;2 



2 

fOO /"OO 



—00 ^ —00 



2 /"OO /"OO 

+ ^^^/22(a;i,X2) / / Z2^i(zi)w2(2;2)rf2;irf2;2 + 0(/i^) 

J —00 J —00 

=f {xi,X2) + ]-h^i I z^wi{z)dzfu{xi,X2) + I z'^W2{z)dzf22{Xl,X2)) + o{h^). 

^ ^ J -co J -00 ' 

This proves statement fl22l) of the theorem for this individual estimator. 
It is easily seen that 

E/^/T (2^1.^2) = E/-+(xi,X2) = E/+"(xi,X2) = E/++(xi,X2) 
= fixi,X2) + -h'^i^ z'^Wi{z)dzfn{xi,X2) + / z^W2(2;)(i2:/22(xi, 0:2) j + o(/i^) 



00 ^ —00 



and thus 

z'^Wi{z)dzfn{xi,X2) + / 2:^W2(2;)'i2:/22(a;i,X2)) +o(/;,^), 



^f^l{x,,X2) = f{xr,X2) + \h'[j_ 



provmg equation 

Next let us derive the asymptotic variance. First, define 

TT++( \ 1 v^v^ ,(xi + i-Xki\ ,(x2+i-Xk2\ 

U++{xi,X2) := TI 2^2^ ^il 7 1^2 1 7 I- (63) 

i=l j=l ^ ^ ^ 

Then /^^^'^(xi, X2) = ^ X]fc=i f^fc/t^l^^i) 3;2), and since the terms f/^"^ are independent, 

Var X2)) = ^Var (f/++(xi, X2)). (64) 

Secondly, we will determine the variance of U^f^ {xi,X2). We have 

Var ([/++(xi,X2)) = Et/++(xi,X2)' - (E f/++(xi, ^2))^ (65) 
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Let us begin with determining EU^j^{xi,X2Y. Note that, if h < we have 

,fXi+ii-Xkl\ ,fx2+i2-Xk2\ ,fxi+ji-Xkl\ ,fx2+i2-Xk2\ „ 

— h — r\ — h — r\ — h — r\ — h — J=° ^^^^ 

unless ii = 12 and ji = j2, where ii,i2,ji,j2 £ ^- This holds because if ii 7^ 12 or ji 7^ j2, then 
at least two pairs of arguments in the product ( !66|) are more than distance two apart, rendering 
the product equal to zero. Thus in the following equation, as h ^ 0, only the square products 
do not vanish and we can write 

j.TT++f ^2 -p/lv^V^ ,fxi+i-Xu\ ,fx2+j-Xu\Y 
Ef/,. ix,,X2) =E(^^^X.-i^ h )^^[ h )) 



1 / , I xi + i - Xii\ I (X2+ i - X_ 



1=1 j=l 



Now we use the substitutions f 1 := ui — i and V2 := U2 — j to obtain 

^ 00 00 /"oo /*oo I • ^ . 2 

^U^ti^i^i^^^f = j^YlYlj J — J^~~^)'^'-^{~ — i — —^^ g{ui,U2)duidu2 

= E E /_ /_ {^^h (^) ) '^("^ + ^' + ^'^^^^^^^ 



i=l i=l 



Note that the integrand is nonnegative, thus interchanging sums and integrals is allowed (Fu- 
bini), so 

Now apply the substitutions zi = (xi — Vi)/h and ^2 = (a^2 — ^2)/^ and recall the bounded 
support of w[ and 102- Furthermore, because lim/j_j,o F++(a;i — hzi, X2 — hz2) = F++(a;i, X2) < 1, 
we can again apply the Lebesgue dominated convergence theorem 

^^ih^i^i^^^f = ^ j J w[{zifw'2{z2fF^~^{xi-hzi,X2-hz2)dzidz2 

= 1f++(xi,X2) r w[{z^ fdz^ C ^(z2)'rf^2 + o{h-^). (67) 
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Now note that EC/++(xi,X2) = E/++(xi,X2) = f{xuX2) + 0{h^). So 



Var(/++(xi,X2)) = -Var(C/++(xi,X2)) 



^F^^{xi,X2) J w[{zifdzi J w'2{z2fdz2 + o 



{h-')~f{x,,X2f-0{h') 



F++{xi,X2) / w[{zifdzi / w'2iz2fdz2 + o{n-^h-^) 



We can follow a similar procedure to obtain the variances of the other estimators. To summarize 
we get 



Var(/^ft"(xi,a;2)) 
Var(/^^+(xi,X2)) 
Var(/+ft"(xi,a;2)) 
Var(4V(^i,^2)) 



1 

1 



F {xi,X2) / iy'i(2;i)^ci2;i / w'2{z2)^dz2 + o{n 



^F-+(a;i,X2) j ^w[{zifdzi j w'2{z2fdz2 + o{n-^h-^), 
^F+-(xi,X2) y j w'2{z2fdz2 + o{n-^h-^), 

^F++{xi,X2) J w[{zifdzij w'2{z2fdz2 + o{n-^h-^). 



Now let us determine the variance of combinations of these estimators. We have 

Var 0:2)) = Var X2) + t2/~,-*'(.Xi, X2) + t3/+,r(.^i, .^2) + t4/A^(:ri, ,T2)) 

=t?Var (/,:,- (xi, 0:2)) + t^Var X2)) + t^Var 0:2)) + t^Var X2)) 

+ 2tit2C0V {f-^{Xi,X2), f'hiXi, X2)) + 2M3C0V U'nhi.Xi, X2), f^hi^l^ ^2)) 

+ 2tit4Cov (a;i, X2), /^^^(xi, X2)) + 2t2t3Cov {f'^^ixi, X2), f^h{xi,X2)) 
+ 2t2t4Cov X2), /„V(a:i, X2)) + 2M4C0V Unh{x,, X2), ^2)). 

Let us look at Gov (f^hixi, X2), fnh{xi, 2^2))- In similar fashion as we determined the variance, 
we find 

CoY{f^h{xi,X2),f^f^{xi,X2)) = ^Coy{U--{xi,X2),U-+{xi,X2)) 

= ^ [E U^h'ixi, X2)U^h^{xi, X2) - E U-f;-{xi, X2)E C/-+(xi, ^2)] 

Let us first determine EU^f^ {xi, X2)Uif^ {xi, X2) ■ Note that, if /i < |, we have 



,fxi-ii-Xki\ ,fx2-i2-Xk2\ ,fxi-ji-Xki\ ,fx2 + j2-Xi 



h 



Wn 



h 



h 



Wn 



^k2 



h 



0, (68) 
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Var ifnhixu X2)) ={tlF—{xi,X2) + X2) + X2) + X2)) 



n 



for all 11,12, ii and j2- This holds because the second and fourth argument in the product ( 168|) 
are always more than distance two apart, rendering the product equal to zero. Thus 

E U-^~ (xi , X2)U-+ (xi , X2) = 0. (69) 

Secondly, because we have already determined YjU^^ {xi,X2) and E [/^^^(xi, X2) earlier, we 
know that 

E[/-^-(Xi,X2)Ef/-+(xi,X2) = /(Xi,X2)' + 0(/l2). (70) 

Thus 

Cov(/-,-(xi,X2),/-+(xi,X2)) = ^[-/(xi,X2)^ - 0{h')] = o{n-'h-'). (71) 
This result holds for all the covariances. So we arrive at 

J-i 

=B{xi,X2,ti,t2,h,ti)-^ I w[{zifdzi I w'2{z2fdz2 + o{rr'^h'^). 

nh^ J-i J-i 

This proves statement (1231) of the theorem. □ 
6.3 Proof of Theorem [431 

The convex combination of the four density estimators is given by 

fnt{Xi,X2) = tif~^{xi,X2) +t2f~^{xi,X2) +hf+^{xi,X2) +Uf++{XI,X2), (72) 

where ti + t2 + ^3 + ^4 = 1- Now define 

Slnh{Xl,X2) = fnhiXl^^2) - fnh{Xl,X2), 
S2nh{xuX2) = -f;;^{Xi,X2) + f^hi^U^^), 
S3nh{Xl,X2) = fnhi^l^^2) " fnh{XuX2), 
S4nh{xi,X2) = -f^hi^U^^) + fnh{xi,X2). 

We can rewrite fl72p as 

fnh{Xl,X2) = fnh {Xl,X2) " {h + ti)Sinh{Xl, X2) - t2S'inh{Xi, X2) + t^SinhiXi, X2) , (73) 



Lemma 6.1 Under the conditions of Theorem\4.1\ we have, for i = 1, . . . , 4 



ESinh{XuX2) =0, (74) 

E5,„^(xi,X2)' = 0(^), (75) 
E ^^^^(xi, X2)^ = O (^ ^^j^i2 ^ ■ (76) 
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Proof 

We give the proof for Sinhixi,X2)- The other claims can be proved similarly. 
Note that 

Slnh{Xl,X2) = f~h{Xi,X2) - f^h {Xi,X2) 

^ n OO CxD , , 

= ii E E E 

fc=l j=0 j=0 

n oo oo . 

+ E E E -i( — ^ — H( — ^ — ) 

k=l i=l j=0 
n oo oo 



7^1 E E E^i( H( 7 )• 



^/i4 ^ ^ ^ iV h 

k=l i=—oo j=0 



Define 



TT f N 1 ,fXi+i-Xkl\ ,(x2+j-Xk2\ 

UMxuX2) ■=-^2^ Z^^il 1 )^2( )• (77) 



i=— oo j=l 

1 V^n 



Then 0:2) = ;^ Ylk=i Uikhixi, X2) and the terms in the sum are independent. 

Following similar steps as in the proof of Theorem 13.11 we get 



/°° f°° ( X — V \ / X — V \ d 
^^H^T^n[^Y^)d^2 ^ T.9i-i + ^^-^+j)dvidv2 
-00 ^-00 i z 



We also have, as in the same proof, 

ESinh{xi,X2f = Vai{Sinh{xi,X2)) = - Vai{Uuh{xi, X2)) = o(^). 

Finally we consider the fourth moment of Sinh{xi,X2) = ^J2k=iUikh{xi,X2). By indepen- 
dence of the terms we have 

1 . ... 3(n-ll 



E Sinh{xi,X2y = -T E Uiihixi, xa)^ H ^ — f E Uuh{xi,X2f) 

n-^ ri'^ \ / 



This completes the proof of the lemma. □ 
From f l73|) we get, omitting the arguments (xi,X2), 

fnh^ ~ fnh ~ ~(^n3 ~ ^3)'S'ln/i — (tn4 — ii)Sinh — {in2 — i2)S^nh + (^n4 — i4)Si^nh- (78) 

Hence, under the assumptions of the theorem and by the Cauchy Schwarz inequality, we have 
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\ fnh ~ fnh I — ^ \^n3 — t^WSinhl + E \tn4: — ^4] |5'l„h| + E |t„2 " ^2] |5'3„ft| + E \tn4: " ^4] |'S'4„/i 
/ _ \ 1/2 / N 1/2 / _ N 1/2 / N 1/2 

< (e (t„3 - h?) (e sQ + (e (t„4 - h?) (e 

/ _ \ 1/2 / x 1/2 / _ N 1/2 / N 1/2 

+ (E(«„2-f2)^) (eS.?.,,) + (E(t.4-f4)') (eSL,) 

Similarly we have, since {yi + 1/2 + + VaY < + yl + yl + vl), 

Var(/i*;^-/S)<E(/it^-/Sf 

< 4E (^3 - h)'Sl, + 4E (44 - t4)'^L, + 4E (42 - t2)'^|n/. + 4E (44 - UfSl^ 

/ _ \ 1/2 / \ 1/2 / _ X 1/2 / N 1/2 

<4(E(t„3-4r) [EsQ +4(E(t„4-t4r) (e^u) 

/ _ \ 1/2 / \ 1/2 / _ \ 1/2 / \ 1/2 

+ 4(E(t„2-t2r) [EsQ +4(E(t„4-t4r) [EsQ 




Since the two bounds above are negligible compared to the order of the bias and variance in 
Theorem 13.11 it follows that this theorem also holds for the estimator with estimated weights. 

In order to prove asymptotic normality note that by Lemma 16.11 and condition flM|l it 
follows that y/nK^ times each of the terms in the representation fl78|) vanish in probability. Also 
it follows that y/nh? times the expectation of flTHl) vanishes asymptotically. Hence the limit 

distributions of of y/nh^{fnh^ — Efnh^) and \/nK^{f^ — E/^2) coincide. The limit distribution 
of the latter follows by checking the Lyapounov condition for asymptotic normality. 

□ 



6.4 Proof of lemma 14.31 

Proof Let us first introduce some notation. Define the vectors v(a;i,a;2) and v„/j(xi,X2) by 
w{xi,X2) = (F""(a;i,X2),F"+(a;i,X2),F+~(xi,X2),F++(xi,X2)), 

^nh (a;i , 2:2) = (^„y (Xl , X2) , {Xi,X2), Ft" (xi , X2) , ^^"^^(3^1, X2)). 

Note that, for n large enough, the components of these vectors are all at least e„ and that they 
are at most one. 

We will only check ( l29l) and ( pTl) for i equal to one. The other cases can be treated similarly. 
Then we also need the vector of partial derivatives of the the function 4(1/1,1/2, ,113, Vi)- Note 
that on the line segment between y^j^{xi,X2) and v(xi,a;2) all the components are all at least 
en and that they are at most one. This implies after some computation 

||V4(l/l,l/2,,Z/3,l/4)f < — , 
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for some constant B, for all points {yi,y2, ,113,114.) on this line segment. 

We can now apply the multivariate mean value theorem and the Cauchy Schwarz inequality 
to get 

(t„i(xi,X2) -ii{xi,X2)f = (ti(v„^(a;i,a;2)) - ti(v(xi, X2)))^ 
= (v„ft(a;i,a;2) - v(xi,X2)) ■ Vti(?/i,?/2, ,l/3,Z/4))^ 

< II X2) - v(xi, X2) f II Vti(?/i, 1/2, , 2/3, y4))f 

< ^ l|v„^(a;i,a;2) - v(xi,X2)f , 

where (l/i, 1/2, , I/3, 1/4) is a point on the line segment between v(xi,X2)„^ and v(xi,X2). Note 
that \\Y^f^{xi,X2) — ^r(xi,X2)\\'^ is a sum of four terms like {F~j^{xi,X2) — F (xi, 0:2))^, which 
is smaller than (F^~(a;i, 0:2) — F {xi,X2))'^, and that E (F^~(a;i, X2) — F {xi,X2)Y equals 
the variance plus the squared bias of F^~(xi,X2). By Theorem 14.21 we can bound these to get 

E{Ll{xuX2)~h{x^,X2)f < ^(o(^)+OCh')) = 0{n-'/'{\ognf), 

for a bandwidth h of order n~^l^ . This implies that fl2^ is satisfied. 

Let us now check that ( l3Ti) is satisfied. By an argument similar to the one above it suffices 
to check if terms like E(F^~(xi,X2) — F (a;i,a;2))^ vanish asymptotically. Write 

^nT(^i'^2) -i^~~(a;i,a;2) = F---(a;i,X2) - EF--~(a;i,a;2) + EF---(a;i,X2) - F--(xi,a;2). 
By the triangle inequality we have 

1/4 



'Xx,X2, 



(E(FY(a:i,X2)-F--(xi,X2)r) < 

< (E(FY(^i'^2)-E^„T(^i'^2)r)''V(EFY(xi,X2)-F 

So, by (a + h)^ < 8(a^ + 6^), a, 6 > 0, we also have 
E{F-r{xx,X2) - F-{x^,X2)f < 

<8E(F;--(xi,X2)-EF---(xi,X2))' + 8(eF---(xi,X2)-F-(xi,X2 

Since the bias vanishes by Theorem 14.21 it suffices to prove the bound of the lemma for the 
fourth power of the error. 

Recall from the proof of Theorem 14.21 that F~^{x\^X2) = ^ Y12=i ^fc/i (^i? ^2), where 

r,— / X 1 fxi-i-Xki\ fx2-j-Xk2\ 

■■=t;Xj:M 1 1 )■ 

Note that the U~~~ are independent. Now write 

1 " - 



n 

k=l 
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where f/^- (xi,X2) = U^j^ (xi,X2) — Et/^- (xi,X2). Since EUf^i^ {xi,X2) equals zero we have 



1=1 



Similar to the derivation of ( l67|l we get 



^3 \ kh, y \ kh, y yri^h^' 

and 



Under the condition on h in the lemma both terms vanish. This shows that (1311) is satisfied as 
well. Condition flM|) follows from condition ( 13T]) by the Cauchy-Schwarz inequality. □ 

6.5 An inequality 

The next lemma can be used to derive the weights that minimize the asymptotic variance of 
the convex combination of the original for estimators of the density /. 

Lemma 6.2 Let ai, . . . , he m positive numbers. Then for all positive ti, . . . ,tm with ti + 
. . . + = 1 we have 

ait^ + ... + amtm > — 7 T, (79) 

where Sm{ai, ■ ■ ■ , am) is defined by 

m— 1 

Sm(ai, . . . , flm) = ^203 • • • flm + fll . . . aj-ia^+i . . . ttm + ^1^2 • • • ^m-l, (80) 

i=2 

t/ie sitm of the m products of length m — 1 obtained by skipping one term in the full product. 

The minimum is attained at the t vector given by ti = 0203 ■ ■ ■ am/sm{(ii, ■ ■ ■ , Om) <iiT'd tm = 
aia2 ■ ■ ■ am-i/smiai, . . . , am) and 

aia2 ■ ■ ■ tti-itti^i ■ ■ ■ ttm . „ 1 
ti = , i = 2,...,m-l. 

Sm yOji , . . . , ttm ) 

Proof Introduce the inner product < ■,■ >a and corresponding norm || • \\a by 

< X,y >a = 0203 ■■■am Xiyi + OiOg ...am X2Z/2 + • • • + aia2 . . . am-l XmVm, (81) 

1/2 



\x 



\a = (0203 ...amxl + aia^, . . .amxl + . . . + aia2 ■ ■ ■ a^-i a:^ j • (82) 
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Then, with 1 equal to the vector of m ones, the Cauchy-Schwarz inequahty imphes 
aia2 ■ ■ ■ dm = (0-102 • • • am){ti + ^2 + • • • + tm) 

= < 1, {ttiti, Cl2^2; • • • ; O-mtm) >a < l|l||a|| (o-ltl, 0-2^25 • • • 5 0'mt"m)\\a 

/ N 1/2 

= V s(ai, . . . , a^) \a2a2, ...am {aitif + aiOs . . . a„ {a2t2f + • • • + 0102 • • • Om-i {amtmfj 

/ n1/2 

= V ■5(01, • • • , flm) ( (0102 • • • am){,aitl + 02^2 + • • • + amfm)) , 

which imphes the inequahty after some rewriting. □ 
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Abstract 

We construct a density estimator in the bivariate uniform deconvolution model. For 
this model we derive four inversion formulas to express the bivariate density that we want 
to estimate in terms of the bivariate density of the observations. By substituting a kernel 
density estimator of the density of the observations we then get four different estimators. 
Next we construct an asymptotically optimal convex combination of these four estimators. 
Expansions for the bias, variance, as well as asymptotic normality, are derived. Some sim- 
ulated examples are presented. 
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1 Introduction 

Before focusing on bivariate deconvolution let us first consider univariate deconvolution . Let 
Xi, . . . , Xn be i.i.d. observations, where Xi = Yi + Zi and Yi and Zj are independent. Assume 
that the unobservable Yi have distribution function F and density /. Also assume that the 
unobservable random variables Zi have a known density k. If the Zi are uniformly distributed 
then we have a uniform deconvolution problem. Note that the density g of Xi is equal to the 
convolution of / and k, so g = k * f where * denotes convolution. So we have 

/oo 
k{x — u)f{u)du. (1) 
-oo 

The deconvolution problem is the problem of estimating f ot F from the observations X^. 
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Several generally applicable methods have been proposed for this deconvolution model. 
The standard Fourier type kernel density estimator for deconvolution problems is based on the 
Fourier transform, see for instance Wand and Jones (1995). Let w denote a kernel function and 
h > a. bandwidth. The estimator fnh{x) of the density / at the point x is defined as 

with 

the empirical characteristic function, and 0^ and 0^ denote the characteristic functions of w 
and k respectively. An important condition for these estimators to be properly defined is 
that the characteristic function (pk of the density k has no zeroes, which renders it useless for 
uniform deconvolution. In fact, Hu and Ridder (2004) argue that in economic applications 
this assumption is not reasonable. If the error distribution is bounded and symmetric then its 
characteristic function will have zeros. They propose an approximation of the Fourier transform 
estimator in such cases. For other modifications of the Fourier inversion method in this problem 
see Hall and Meister (2007) and Feuerverger, Kim and Sun (2008). 

In some univariate deconvolution problems one can apply nonparametric maximum like- 
lihood. In the uniform deconvolution problem for instance the error Z is Uniform[0, 1) dis- 
tributed. So in this particular deconvolution problem we assume to have i.i.d. observations 
from the density 

/oo PX 
I[o,i)ix-u)f{u)du= / f{u)du = F{x)-F{x-l). (3) 
oo J x—1 

Groeneboom and Jongbloed (2003) consider density estimation in this problem. They propose a 
kernel density estimator based on the nonparametric maximum likelihood estimator (NPMLE) 
of the distribution function F and derive its asymptotic properties. For estimators of the 
distribution function in uniform deconvolution, related to the NPMLE, we refer to Groeneboom 
and Wellner (1992) and Van Es and Van Zuijlen (1996). 

A selected group of deconvolution problems allows explicit inversion formulas of ([1]) ex- 
pressing the density of interest / in terms of the density g of the data. In these cases we can 
estimate / by substituting for instance a direct kernel density estimate of g in the inversion 
formula. In Van Es and Kok (1998) this strategy has been pursued for deconvolution problems 
where k equals the exponential density, the Laplace density, and their repeated convolutions. 

If we apply inversion to the uniform problem then it turns out we get two obvious inversion 
formulas. Of course these inversions agree on the set of densities of the form (|3]), but they are 
different outside of this set. Plugging in a kernel estimator of the density g of the observations, 
which is typically not of this form, then yields two estimators of /. These can then in some 
sense be optimally combined in a convex combination. This approach is developed in Van Es 
(2010). Here we will follow this approach in the bivariate uniform deconvolution setting. 
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Let us now consider bivariate deconvolution. The bivariate convolution formula Xj = Yj + 
where 5( 
notation as 



Zj, where Xj, Yj and Zj stand for two dimensional random vectors, can be written in vector 



The estimation principles described above can in principle all be attempted in the bivariate 
problem as well. See for instance Youndje and Wells (2008) for recent results on multivariate 
Fourier type kernel deconvolution. Approaches based on nonparametric maximum likelihood 
and inversion hardly exist to our knowledge. 

In the bivariate uniform deconvolution setting the random vector Zj has a Uniform([0, 1) x 
[0, 1)) distribution, i.e. it is uniformly distributed on the unit square. Here we can also express 
the bivariate density g of the observations in terms of the bivariate distribution function F, 
with density /, of the random vector Y. We have 



oo J — oo 
X2 rxi 



g{Xi,X2)=l I I[o^i){Xi-Ui)I[o,i){x2-U2)f{Ui,U2)dUidU2 

f{ui, U2)dUidu2 

= F{xi, X2) - F{xi, X2-I)- F{xi - 1, X2) + F{xi - 1, - 1). (5) 

This is the bivariate analogue of formula ((31). Note that, again, the Fourier inversion approach 
can not be used because of the zeros in the characteristic function of the bivariate uniform 
distribution. 

The main aim of this paper is to develop the inversion approach of Van Es (2010) for 
bivariate uniform deconvolution. In Chapter |2] we derive four inversion formulas for ([5]). This 
yields the same number of possible estimators if we plug in a density estimator of the density g 
of the observations. In Chapter [3] we combine these estimators in a convex combination which 
is asymptotically optimal in some sense. The weights of this combination turn out to depend 
on the unknown distribution F. A general theorem for an estimator with estimated weights is 
given in Chapter |H We also present specific estimators of these weights. Simulated examples 
are presented in Chapter EJ Chapter E] contains the proofs. 



2 Inversion formulas 

Recall that the density of the Zj is equal to k{zi,Z2) = /[o,i)x[o,i)(-2i? -^2) = -^[o,i)(^i)-^[o,i)(-22)- 
This yields formula (|5]) which expresses g{xi,X2) in terms of F{xi,X2)- Lemma l2.ll below 
demonstrates that the converse is also feasible. 
First note that for 

F-iyi,y2) :=Pr(Fi<l/i,F2<l/2), 
F-^{yi,y2) :=Pr(Fi<i/i,K,>Z/2), 
F+'{y,,y2) :=Pr(Yi >yi,Y2 <y2), 
(1/1,^2) :=Pr(Yi>yi,F2>l/2). 



3 



the following equalities hold 



X2) = F{xi, X2), (6) 

F~+(xi,X2) = -FVi(xi) - F(xi,X2), (7) 

F+"(Xi,X2) = Fy2(x2) - F(xi,X2), (8) 

F++{xi,X2) = F{xi,X2) - FyAxi) - Fy,{x2) + I. (9) 

If we know F{xi,X2) and if this function is continuously differentiable over xi and X2, then 
we know f{xi,X2), because f{xi,X2) = Q^^g^F{xi,X2)- In fact, combined with the formulas 
above, and ([5]), this gives us four different inversion formulas to obtain / and F from g, as is 
stated in the following Lemma. 

Lemma 2.1 Assume that \imx^^±oo fixi,X2) = and \imx2^±oo f {xi, X2) = 0. Then we have 

00 00 

F—{xi,X2) = ^^gixi-i,X2-j), (10) 

i=0 j=0 



00 00 



F +{xi,X2) = ^^g{xi~i,X2+3), (11) 

i=0 j=l 



00 00 



F+ (xi,X2) = ^^fi'(xi +i,X2 - j), (12) 

i=l j=0 
00 00 

F++(xi,X2) = ^^fi'(xi +i,X2 + i). (13) 
i=i j=i 

Furthermore, assuming that g{xi,X2) is twice mixed continuously differentiable over Xi and X2, 
then there are four inversion formulas to recover f from g. We have 



/(X1,X2) = 5^5^T^^^;-T^(7(xi-^,X2-j), (14) 

i=0 j=0 



oo oo ^2 

dx\dx2 

oo oo q2 

f{xi, X2) = 9{xi -i,X2+ j), (15) 

00 00 r^2 

f{xu = 9{xi +t,X2- j), (16) 

i=l j=o 1 2 
00 00 q2 

f{xi, X2) = J2Y1 9{X1 +l,X2+ j). (17) 

i=l j=l ^ 2 

To get some more insight in these inversion formulas note that can be interpreted as a 
probability for Y (under F). We have 



gixi,X2) = Pf(Y e {xi - l,xi] X {x2 - 1,0:2]). 
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So g{xi, X2) is equal to the probability that Y belongs to a specific square (xi — 1, xi] x (x2 — 1, X2]. 
Adding up over suitable squares we then get the probability that Y belongs to a specific 
quadrant with a given vertex. For a formal proof see Chapter [6l 
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Xj Xj+l Xi+2 Xj+3 



Figure 1: X2) = E"i E," 1 ^f(Y, G (xi + ? - 1, Xi + z] x (x2 + J - 1, X2 + j]). 



3 Estimation of the density function 

In the previous chapter we have derived inversion formulas that express the density / in terms 
of the density g of the observations. Now we can use an estimator of g, for which we have 
observations, to estimate /. For an arbitrary density that is not of the form the inversions 
will in general not yield distribution functions or densities, nor will they coincide. This typically 
happens if we estimate g. 

We use kernel smoothing but of course other estimators can be used as well. Let us introduce 
a bivariate kernel density estimator with bivariate kernel function w and bandwidth h > 0. 
The estimator gnh of g is given by 

gn.{Xu X2) = ^ )^ W [-^^, J • (18) 

k=l ^ ^ 

Usually, w is chosen to be a bivariate probability density function. This way it is ensured that 
gnh is also a density. See for instance Silverman (1986) and Wand and Jones (1995). 

We impose the following condition on the kernel function. 
Condition W 

The junction w is a probability density function on with support [—1, 1] x [—1, 1]. Fur- 
thermore, we will use a product kernel w(ui,M2) = w{ui)w{u2) , where w{ui), with i G {1,2}, 
denotes a continuously differentiable univariate symmetric probability density function. 
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Plugging in the kernel estimator in the four inversion formulas of Lemma 12.11 we get four 
kernel estimators of the density given by 



n oo oo ' V ' V 



h 

k=l 1=0 j=0 

n oo oo . 

u (^1,^2) = ( — h — r [ 

k=l 1=0 j=l 

^ n 00 00 . 

r+-r ^ 1 Y^Y^Y^ ,fXi+l- Xkl\ ,fX2- J - Xk2 



fc=l 1=1 j=0 
n 00 00 



h 

fc = l 8=1 j = l 



We will derive /^"^(xi, 3:2). The other three estimators follow similarly. Define := -^w{u). 



Lemma [2?T] in combination with X2) = f{xi,X2) gives us 

8x18x2' 



00 00 ^2 



8'^ 1 1 /Xi + 2 - Xfci X2 + J - X, 

to 



i=l i=l 

00 00 , „^ 1 1 

^ ^ \8x18x2n ^ \ h ^ h 
1=1 j=i \ ^ \ 

1 -A^-^ ,fxi+i- Xkl \ ,fx2+j- Xk2 

^.i^i^i^^y — I — — I — 

k = l 2 = 1 j = l ^ ^ ^ 



k2 



Note that, because of the bounded support of w, the sum is in fact a finite sum. In the last 
step we used the fact that w is a product kernel, and thus g^^^^^ ^2) = w' {ui)w' {U2) ■ 
Next we introduce a convex combination of the four previous estimators. Write 

fiti^l,X2) = tif-^{Xi,X2)+t2f-f^{Xi,X2)+hf;l^{xi,X2)+Uf^f^{Xi,X2), (19) 

where t = (ti, t2, ^3, ^4) and ^1+^2 +^3 +^4 = 1- For suitable choices of ti, t2, ^3, ti this combination 
will turn out to have better properties than any of the estimators separately. Notice that when 
we set ti, t2, ts, or equal to one and the others equal to zero, we get results for fnhJnt^, 
fnh^ or fnh individually. 

Theorem 3.1 Assume that Condition W is satisfied, that f is bounded, and that 
limxj^^±oof{xi,X2) = li'mx^^±oof{.Xi,X2) =0. // / is twice continuously differentiable on a 
neighborhood of X — (^1 ; X2 ) then, as n ^ 00, h ^ 0,nh ^ 00, we have 

E/i2(Xi,X2) = /(Xi,X2) + -/l' / Zlw{z)dz{fn + f22){XuX2) + o{h^). (20) 

Furthermore, as n 00, /i 0, n/i —i- 00, we have 



Yaj:{f2{xi,X2)) = ^B{xi,X2MM,hM)[ / w\zYdz] + o{n-^h-'') (21) 



where 

B{xi,X2, h, t2, h, h) = {tlF— + tlF-+ + tlF+- + tlF++){xi,X2). (22) 

From the theorem we see that the expectation of /^^(^I'^s) is the same whatever convex 
combination we choose for. Lemma [3^2] gives the weights that minimize the leading term in the 
variance ( l2T|l . 

Lemma 3.2 Assume that (xi,X2) is an interior point of the support of f . The weights ti, t2, 
^3 and ti, with ti + 12 + 1^ + 1^ = 1, that minimize the leading term in the variance l[21\) . are 
denoted by ii{xi,X2), i2{xi,X2), i3{xi,X2) and i4{xi,X2) and they are equal to 



hi 




X2) 


= 


-,++, 


^Xi, 


.X2)A{xi, 


X2 


i2\ 






= 


-,++, 


[Xl: 


X2)A{xi, 


~X2 






3^2) 


= F—^- 


f,++, 


[Xl: 


.X2)A{Xi, 


X2 


t^\ 


^X\ . 


X2) 


= F~~'~- 




^Xi, 


X2)A{xi, 


X2 



The resulting variance of this optimal convex combination is then equal to 

VaT{fnh{xi,X2)) = A{xi,X2)C{xi,X2)^(^j w'{zfdz^ +o{n~'^h~^), (23) 

Here 

A{xi, X2) := (F-+'+^'++ + F— '+-'++ + F— '-+'++ + ■-+'+-)-i(xi, X2). (24) 
where, for ai, a2, 61, 62, Ci, C2 G {-, +}, 

^aia2,f>if>2,ciC2^^^^^_^^) := F'''''^{xi,X2)F'"^^{xi,X2)F''''%XuX2), (25) 

and 

C(xi, X2) := F (xi, X2)F~+(xi, X2)F+"(xi, X2)F++(xi, X2). (26) 

Proof 

First note that the weights are well defined since the fact that {xi,X2) is an interior point of 
the support of / implies that F {xi, X2), F~~^{xi, X2), F~^~{xi, X2) and F~^~^{xi, X2) are strictly 
positive. The lower bound now follows from Lemma 16.21 in Chapter [61 □ 

Note that in general, of course, we do not know F. However, in Section |4] we show that we 
can estimate F (xi,X2), F~~^{xi,X2), F~^~{xi,X2), and F~^~^{xi,X2), again using the inversion 
formulas of Theorem 12. 1[ This will lead to estimates of the optimal weights. We then prove that 
the estimator with estimated weights shares the properties of Theorem 13.11 with the optimal 
weights. 
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4 The final estimator with estimated optimal weights 

Let us write in{xi,X2) = {ini{xi, X2), ■ ■ ■ ,ini{xi, X2)) for a vector of estimated weights. The 
next theorem shows that under some conditions on these estimators the hmit behaviour of 
fnh\xi,X2) resembles the optimal limit behaviour of the estimator fnli^i^^^)- 

Theorem 4.1 Assume that Condition W is satisfied, that f is bounded, and that 
Um^^^±oof{xi,X2) = lim^^^±oof{xi,X2) = 0. 
Assume for i = 1, . . . ,4, 

E X2) - tiixi,X2)y = o(n/ii°). (27) 

/// is twice continuously differentiable on a neighborhood of ) then, as n ^ 00, h 

0,nh — 00, we have 



Efi';:\xuX2) = f{xuX2) + lh' I zlw{z)dz{fn + f22){xi,X2) + o{h'). (28) 

^ J -00 

Assume for i = 1, . . . ,4, 

E (L(xi, X2) - U{xi, X2))^ = 0(1). (29) 
Then, as n —)■ 00, /i —)> 0, n/i —)■ 00, we have 

Y^i{ft\xuX2)) = ^a(xi,X2)2 + o(n"i/i-6), (30) 

where, with the notation of Lemma \3.S\ a{xi^X2Y is defined by 

a{xi,X2f = A{xi,X2)C{xi,X2)(^J w'{zfdz^ . (31) 
Assume for i = 1, . . . ,4, 

E (L(xi, X2) - U{xi, X2))' = 0(1). (32) 

Then the estimator is asymptotically normally distributed. We have, ,as n — )■ 00, /i — )• 0, n/i — >■ 
00, 

V^h' [fit\xi, X2) - E ft\xuX2)) ^ iV(0, a(xi, X2)'). (33) 

Let us next construct suitable estimators of the weights based on the estimators of F , -F^^, 
and F'^^ . As in estimation of the density we can plug in ( |T8i) into the inversion formulas for F in 
Lemma [2TT] and get kernel estimators of F (xi, X2), F^+(xi, X2), F"'"~(xi, X2) and ^^^(xi, X2). 



00 
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We get four estimators, given by 



/ N [Xi-i- Xkl \ f X2- j - Xk2 



h 



k=l 1=0 j=0 

1 V^V^V^ f Xi-t- Xkl \ I X2+ J - Xk2 



nh^ ^ ^ ^ \ h /V h 

k=l 1=0 3=1 ^ ^ ^ 

X 1 ^V^V^V^ Xi + l~ Xkl\ (X2-J-Xk2 

^n\(-i,-2)=^j:5:E-^ — h — r[ 



k=l 1=1 j=0 

n oo oo 



h 



fc=i 1=1 j=i 



The following theorem establishes the asymptotic bias and variance of these four estimators. 



Theorem 4.2 Assume that Condition W is satisfied. Then, as n ^ oo,h — )■ 0, n/i — t- oo, we 
have We have 

1 /•! 



z w{z)dz + o{h ) 



1 

EF--{xi,X2) = F -{xi,X2) + -h\F^^ + F^^ ){xi,X2) 
EF-+(xi,X2) = F~+{x,,X2) + i^h\F{+ + F^^+){xi,X2) j z^w{z)dz + o{h' 

1 

EF+f^-{xi,X2) = F+-{xi,X2) + -h\F+- + F+-){xi,X2) J z^w{z)dz + oih^ 
EF++{xi,X2) = F++{xi,X2) + h\F++ + F++){xi,X2) j'^ 



z'^w{z)dz + o{h'^) 



where F^f = ^'^"^jp'"^^ and F^f = ^^^^gf^, etc.. 
For the variances we have 

Var(i^„Y(a;i'a;2)) = F—{xi,X2)^ (^j ^w^{z)dz^ ^ " {jj^ 

Var(F-,+ (xi,X2)) = F-+(xi,X2)^ (^j w\z)d^ + o {J^^ 



Var(F„\ (xi,X2)) = F+ (xi.xa)^ (^j ^ 

Var(F++(xi,X2)) = F++(xi,X2)^ (^f^ 
For the proof of this theorem see Chapter [6l 



w {z)dz + o 

1 



2 

2 ^ 



Next we write the optimal weights of Lemma 13.21 in terms of functions ti defined by 

U{Xi,X2) = ii{F''{Xi,X2),F'^{xi,X2),F^^{xi,X2)),F^^{Xi,X2)), 2 = 1, ... ,4. 

Let (e„) denote a sequence of numbers with < e„ < 1 and e„ — as n — > oo. Then 
define truncated versions of the estimators X2), X2), -^^"(xi, X2), -F^~(a;i, X2) 

and F+^+(xi,a;2) by 

^nT(^i'^2) = min(max(F„"^"(xi,X2),e„), 1), 
^2) = min(max(F„"^+(xi, X2), e„), 1), 
Fni^ ixuX2) = min(max(F+^"(xi,X2),e„), 1), 
i^„V'(xi,X2) = min(max(i^„++(xi,X2),e„), 1). 

Since the bandwidth used in the estimators of the weights can in general be different to the 
bandwidth h used in the estimator of /, we will denote this bandwidth by h. We now obtain 
estimators of the weights by plugging in these estimators. We get 

ini{xuX2) = U{F---{xi,X2),F--^{xi,X2),F^r(xi,X2)),F^j^{xi,X2)), i = 1,...,4. 

The next lemma shows that these estimators, with a suitable bandwidth, can be used to estimate 
the optimal weights without disturbing the asymptotics of Theorem 13.11 

Lemma 4.3 // /i ^ n^^^^ , e„ = 1/ logn, and if we use a bandwidth h of the form h = cn"^^^ , 
where c is a constant, then the estimators 

ini{xi, X2) = t,(F---(xi, X2), F~^{xi,X2), F^f^ixi, X2)), Ft+(xi, X2)) 

satisfy H^j, ^ and 

If we compare the performance of our final estimator with estimated optimal weights to the 
performance of the four individual estimators then we see that the first order of the expectation 
is the same. The variance of the combined estimator contains the term X2) which is equal 
to the product of F (xi, X2), X2), X2) and X2). This shows that the 

variance is small along the edge of the support of /. By Theorem l3. li the variance of, for instance, 
f~f^{xi,X2) is proportional to F {xi,X2)- This shows that this estimator will perform better 
in the lower left of the support of / then is will in the other part. By using the estimated 
optimal convex combination the worse behavior of the four individual estimators in certain 
areas is reduced. 

If we minimize the pointwise asymptotic mean squared error of f^^^ {xi,X2) and thus balance 
its asymptotic squared bias and its asymptotic variance given by Theorem 14 . 1 1 then we see that 
the optimal bandwidth is of order n~^^^^. The corresponding mean squared error is then equal 
to This of course raises the problem of bandwidth selection which we will not pursue 

here. 
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5 Simulated examples 



To illustrate the estimator we have simulated two examples. In the first example the density / is 
unimodal. In the second example / is a mixture of two unimodal bivariate densities, rendering 
it bimodal. In the first example / is concentrated on the square [0.25,1.75] x [0.25,1.75]. In 
the second example / is concentrated on the square [0.2, 1.8] x [0.2, 1.8]. This means that both 
deconvolution problems are not at all trivial. 

To speed up computations we have followed the bivariate binning technique as advised in 
Wand (1994). For the x and y coordinates we have chosen for a grid of 500 points between -1 
and 4. We have used a product kernel based on the so called biweight kernel given by 

M«) = ^ V.ilH- (35) 

Example 5.1 In our first example / is the density of the random vector (Yi, Y2), where Yi and 
Y2 are two independent random variables that each have a certain shifted and rescaled beta 
distribution. To be more specific Yi = 0.25 + 1.5Vi,i = 1,2, where the Vi are independent and 
both Beta(3,3) distributed. We have simulated 1000 values so n = 1000. The bandwidth h, 
chosen by hand, is equal to 0.5. 

The true density / and its estimate are given in Figure [2] . The difference between the true 
density and the estimate is plotted in Figure [31 The right plot in Figure [3] shows f^,^. Clearly 
this estimate is best in the H — quadrant, as predicted by the theory. 




Figure 2: Left: the true density. Right: the estimate. 



Example 5.2 In our second example / is the density of the random vector {Yi, Y2), where Yi 
and Y2 are dependent random variables with a bimodal distribution. The distribution of the 
vector is a mixture of two distributions like the one in Example 15. 1[ The values of the Y^s are 
generated as follows. With Vi and V2 having the same distribution as in the previous example 
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Figure 3: Left: the difference of the true density and the estimate. Right: /^^ . 
the Y values are given by 

, with probabihty 2/5, 
, with probabihty 3/5. 

We have simulated 5000 values so n = 5000. The bandwidth h, chosen by hand, is equal to 

0. 35. 

The true density / and its estimate are given in Figure H] . The difference between the true 
density and the estimate is plotted in Figure O The right plot in Figure O shows /"^j"^. Clearly 
this estimate is best in the — h quadrant, as predicted by the theory. 

6 Proofs 

6.1 Proof of Lemma 12JJ 

Let us first determine the inversion formulas for F{xi,X2)- We sum g{xi — i,X2) = F{xi — 

1, X2) — F{xi — 2, 0:2 — 1) — F{xi — i — 1, X2) + -^(3^1 — i — 1, X2 — 1) over the first coordinate to 




Vi + 0.2\ 

\y2 + Q.s) 

fVi + 0.8\ 

1^2 + 0.2] 
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Figure 4: n = 5000, h = 0.35. Left: the true density. Right: the estimate. 



obtain two telescopic sums. Thus we get 

oo 

^gixi - i,X2) 

i=0 

oo 

= - i, X2) - F{xi - i, X2 - 1) - F{xi - i - 1, X2) + F{xi - i - 1, X2 - 1)} 



i=0 

00 



= - i,X2) - F{xi - i - 1,X2)} - ^{F{xi - i,X2 - 1) - F{xi - i-l,X2 - 1)} 

i=0 i=0 

= F(xi,X2)-F(xi,X2-l). (36) 

Here we used that hmj_j.oo -^(a^i — hX2) = hmj__^oo -^(^^i — i,X2 — I) = 0, for F is a bivari- 
ate distribution function. Next, we sum over the second coordinate. Because we also have 
limj^oo F{xi, X2 - j) = 0, we get 

00 00 00 

^^dixi - i,X2 ~j) = ^{F{xi,X2 -3) - F{xi,X2 - J - 1)} = F{xi,X2). (37) 
j=o j=o i=o 

Because the terms are nonnegative, the order of summation can be interchanged and we have 
shown (fTOj) . Thus we have found an expression for the unobservable probability distribution 
function F in terms of the observable density function g. 

Above, we iterated over — so now let us determine what happens when we iterate over +i. 
First, we write g{xi + i, X2) as 

g{xi + z, X2) = F{xi + i, X2) - F{xi + i, X2 - 1) - F{xi + i-l,X2) + F{xi + i - 1, X2 - 1). (38) 

Secondly, we take the sum over the first coordinate. Again we get two telescopic sums. Note 
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Figure 5: Left: the difference of the true density and the estimate. Right: f^f^. 

that hmj_j.oo F{xi + i, X2) = Fy^{x2) and hmj_j.oo F{xi + i,X2 — I) = -^^2(^2 ~ 1); so we get 

00 

^g{xi + i,X2) 
1=1 

00 

^{F{xi + i, X2) - F{xi +i,X2-l)- F{xi +i-l,X2) + F{xi + z - 1, X2 - 1)} 



i=l 
00 



= + 2, X2) - F{xi +i-l,X2)} + ^{F{xi + i-l,X2-l) - F{xi +i,X2~l)} 

i=l 1=1 

= Fy,{x2) - F{xi,X2) + F{xi,X2 - 1) - FyAx2 - 1). (39) 

Thirdly, we sum over the second coordinate. Because Ymij^oo Fy^{x2 — j) = 0, this results in 

00 00 

^^9{xi + i,X2- j) 

j=0 i=l 

00 

= Y.^Fy,{x2 - j) - F{xi, X2 - 3) + F{xi,X2 -3-1)- Fy,{x2 -3-1)} 

j=0 

00 00 

= ^{FY2{x2-j) - FY2{x2-j - 1)} - ^{F{xi,X2-j) - F(xi,X2 - J - 1)} 
j=0 j=0 

= FY,ix2) - F{xi,X2) = F+-{xuX2). (40) 
Again, we can interchange the sums and we have shown fll2p . In similar fashion we can derive 

The last formula to recover F{xi,X2) can be derived as follows. We begin with 
g{xi + 1, X2 + 1) = F{xi,X2) - F{xi, X2 + 1) - F{xi + 1, X2) + F{xi + 1, 0:2 + 1). (41) 
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Now sum over the first coordinate to obtain 

oo 

^gixi + i,X2 + 1) = F{xi,X2) - F{xi,X2 + 1) - Fy2{x2) + Fy^{x2 + 1). (42) 

i=l 

Summing over the second coordinate we get 



oo oo 



^^g{xi + i,X2+ j) = F{xi,X2) - Fy,{xi) - Fy^{x2) + 1 = F++(a;i, a^a). (43) 
j=i i=i 

Changing the order of summation again, we obtain ( !T3|) . 

The four inversion formulas for / are derived in a similar fashion. From we have 



g{Xi,X2) = f{Xi,X2) - f{Xi,X2 - 1) - f{Xi - 1,X2) + f{Xi - 1,^2 - 1). 



dxidx2 

Now, following equations f l36|) and fl37|) . we obtain 



EE 

1=0 j=0 



dxidx2 



g{xi - i,X2 - j) = f{xi,X2). 



(44) 



Here we have used lim2;j_j,_oo f{xi,X2) = and lim^^g^-oo f{xi,X2) = 0. 
The other three inversion formulas follow similarly. 



6.2 Proof of Theorem [331 

First we consider the estimator f^f^- We have 



□ 



^ k=l i=l j=l 



n oo oo 



,f Xi+i- Xki\ ,fx2+j- Xk2 

w ]w ' 



h 



h 



oo oo 

^EEe 



w 



,fxi + i-Xu\ ,fx2+j-X 



i=i j=i 

oo oo f^QQ «CXD 



w 



-12 



W 



h J \ h 

,/Xi+i — Ui\ ,(X2+j—U2 



oo J — oo 



h 



W 



h 



g{ui,U2)duidu2- (45) 



i=i j=i 

Note that interchanging integrals and sums is allowed because 



1 



oo oo 



1=1 j=l 



oo /"OO 



oo ■/ — oo 



w 



l(Xi + i - Ui 
h 



W 



,fX 2+j - U2 
h 



g{ui,U2)duidu2 < OO. (46) 



To check this, we first make the substitutions Vi := Ui — i and V2 := U2 — j. Secondly, we 
interchange the sums and integrals again, which is allowed because the integrand is nonnegative 
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(Fubini). We get 

^ oo oo 



OO POO 



i=l j=l 

coo fOO 



OO J — OO 



w 



l(Xi — Vi 

h 



\ . fX2 — V2\\ , 

j — — j\9{vi + i,V2 + J)dvidv2 



1 



OO J —OO 



,fX i - Vi 

h 



OO OO 

w' (-^-f^) I + ^ ^2 + j)dvidv2. (47) 



1=1 j=i 



Thirdly, noting that ^++(^1,^2) = Yl'^i Yl'^i9{'"i + hV2+j)dvidv2 and that ^++(^1,^2) < 1, 
we obtain 



-00 J —00 

fOO POO 



h 

1 .Xi-vi\ ,( X2-V2 \ 



h 

X2-V2' 



dvidv2 < 00. 



(48) 



Because w' is a bounded function, and has bounded support, this integral is finite. Thus our 
use of Fubini's Theorem is justified. Next we apply partial integration twice, yielding 



^ 00 00 «QQ 

E/.Y(xi,X2) = ^EE/ 

^ i=l i=l J-o° ^ 



X2+ j - U2 
h 



W 



l(Xi +1 -Ui 

h 



^g{ui,U2)du'^ \du2 



EE 

i=i j=i 

00 00 



i=i j=i 

00 00 



w 



w 



00 POO 



X2+J- U2 
h 



Xi + I — Ui 



w 



xi + i — ui\ d 



h J dui 



-g{ui, U2)dui \ du2 



h 



w 



,(X2 + J - "2\ <9 



h ' dui 



^-K—g{ui,U2)du2 \ dui 



V^V^ /"^ /'Xi + i-ui\ (X2+j-U2\ 



1=1 j=i 

By the substitutions vi := ui — i and V2 := U2 — j we get 
Ef:,^x,,X2) 



-g{ui, U2)duidu2- 



1 



00 00 



00 POO 



W 



00 •/ —00 



Xl — Vi\ (X2 — V2\ d 



h 



w 



h J duidu2 



5((vi + i, f 2 + j)dvidv2. (49) 



i=i j=i 

Now we need to interchange integrals and sums again. Therefore, rewrite the equation above 

as 



00 00 

EE 

i=i j=i 



00 POO 



W 



00 ^ —00 



Xl — Vi\ (X2 — V2\ d 



h 



w 



Ml M2 „oo 



= lim lim \ y 

Mi->oo M2-*-oo ^-^ ^-^ 
i=l j=l 



W 



—00 J —00 



h 



g{vi + i,V2 + j)dvidv2 
) ^ ) dvidv2 ^^^^ + hV2+ j)dvidv2 



h ) dv\dv2 

Xl—Vx\ (X2 — V2\ d"^ 



lim lim 

Mi->oo M2^oo 



00 poo 



W 



00 J —00 



Ml ^2 

X\ — V\\ (X2 — V2 \ O 



W 



h 



Y.11^^9{vi + hV2 + 3)dv,dv2. (50) 



i=i j=i 
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In (!38|) we found that g{vi + i, f 2) = F{vi + i, f 2) — F{vi + i, f 2 — 1) — F{vi + i — 1, f 2) + F{vi + 
i -1,V2- 1), so 

g{vi + Z, V2) = f{Vi + I, V2) - f{vi + i,V2-l)- f{Vi +l-l,V2) + f{vi +i-l,V2-l). 



dvidv2 

Following the summation of fl39p . we find 



Q2 



■h 

^{fi.'^'i + h V2) - f{vi +i,V2-l)- f{vi + i-l,V2) + f{vi + z - 1, t;2 - 1)} 



i=l 

Ml 



i=l 

Ml Ml 

J2^f{vi + I, V2) - f{vi + ^ - 1, t;2)} + J2^f{v, +z-l,V2-l)- f{v, + ^, t;2 - 1)} 

i=l i=l 

f{v, + Ml, V2) - f{vi^V2) - /K, V2-I)- f{v, + Ml, V2 - 1) (51) 



and 



M2 Ml q2 

J2J:d^^9iv, + z,V2+j) 
j=l t=l 

M2 

= Yl^fi^i + M^,V2+ 3) - f{vi.V2 + 3) - fivi, V2+3~^)- f{vi + Mi, V2 + 3 - I)} 
i=i 

A/2 M2 
= Yl^fi^l + M,,V2+ 3) - fiVl + Ml, V2+j~l) + Y^ifiVi, V2+j-l)- f{Vu V2 + j)} 

i=i i=i 
= f{vi + M^,V2 + M2) - fivi + Ml, V2) + fivi, V2) - f{vuV2 + M2). (52) 

Note that this sum is finite for all t'i,f2, because / is bounded. Also note that changing the 
order of summation is allowed, because Mi,M2 < 00. Furthermore, in Theorem 12.11 we found 
that the sum converges to 

Ml M2 q2 

lim lim S^S^———g{vi + i,V2+3) = f{.vi^V2)<oo. (53) 

Mi-i>oo M2->oo ^-^ ^-^ OV^ OVo 

i=l j=l ^ ^ 

We have assumed that / is bounded, so let f{vi,V2) < \A for all Wi,f2, where A > is a 
constant. Observe the following inequality 

\f{Vi + M^,V2 + M2)~f{Vi + M^,V2)+f{v^.V2)-f{Vi,V2 + M2)\ 

< \fivi + Mi,V2 + M2)| + \fiv, + Mi,t;2)| + \fiVi,V2)\ + \fiVi,V2 + M2)| 

< A, (54) 
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for all fi,f2,Mi, and M2. Note that, because w is nonnegative, bounded and has bounded 
support, 

/ w( ; ]w[ ; ]aviav2 < 00 (55) 

.ooJ-00 \ h J \ h J 

for all xi,X2. Thus we can apply the Lebesgue Dominated Convergenge Theorem to poj) . and 
find 

/•oo poc _ y s /X2-V2\ d"^ 

lim lim / / w[ — - — ]w[ — - — ) V V — — — ^(wi + + j)rfwirfw2 



i=l j=l 



/oo /-oo f X2 — V2\ U 

\ h / \ h J Mi^oo M2^oo ^ dViOV2 

I — i J — i 



00 J —00 

00 POO 



'^i L ^M ^^f , ) f{vi,V2)dvidv2. (56) 



00 ^ —00 ' 



Summarizing we now have 

^fnhi^uX2) = ^J J w( J^ ^ ^^ ^w( J^ ^ ^^ ^f{Vl,V2)dVidV2. (57) 
Substituting Zi := ^^^^ and Z2 := we get 

/OO /"OO 
/ w{zi)w{z2)f{xi - hzi,X2 - hz2)dzidz2. (58) 
00 J ~oo 

Using the multivariate version of Taylor's theorem derived in Wand and Jones (1995) for this 
particular application, allows us to rewrite 

f{Xi - hZi,X2 - hZ2) = f{Xi, X2) - h{zifi + Z2f2){Xl, X2) 

+ h?{zlfii + Z1Z2U12 + /21) + Zlf22){xu X2) + 0{h^). 
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Using the symmetry of to, we obtain 

/OO POO 
/ w{Zi)w{z2){f{Xi,X2) - h{Zifi + Z2f2){Xl,X2) 
-OO J —OO 

+ ^^^(^l/ll + ZiZ2{h2 + /21) + zlf22){xi,X2) + o{h^)}dZidZ2 

/OO f^OO 
I Ziw{zi)w{z2)dzidz2 

/OO POO 
/ Z2W^(-2l)w(2:2)c?-2lC?-22 
00 J —00 
-y nOO rOO 

+ ^h'^f ii{xi,X2) I I zlw{zi)w{z2)dzidz2 

^ J —00 J ~oo 



+ T^h"^ if 12 + f 21) {xi,X2) I I ziZ2w{zi)w{z2)dzidz 



2 

fOO POO 



00 fOO 



00 J —00 



2^ POO POO 

+ :^h'^f22{xi,X2) / zlw{zi)w{z2)dzidz2 + oih"^) 
^ J —00 J —00 

1 f°° 

--f{xi,X2) + -h'^ / + /22)(a;i,X2) + o(/i^). 



00 



This proves statement (l20l) of the theorem for this individual estimator. 
It is easily seen that 



^fnh iXuX2) = E/„+(xi,X2) = E/+^ {Xi,X2) = E /++ (xi , 0:2) 

1, 
-/ 
2 



/(a;i,X2) + i/i^ / + /22)(xi,X2) + o(/i^) 



00 



and thus E /^2(a;i, a;2) = f{xi, X2) + \h'^ 2;2w(z)d2;(/ii+/22)(a;i, a;2)+o(/i^), proving equation 
Next let us derive the asymptotic variance. First, define 



TT++( \ 1 v^v^ Jxi+i-Xki\ ,(x2+]-Xk2\ 

i=l j=l \ / ^ / 

Then fnf^{xi,X2) = ^ J22=i Ukh'{xi,X2), and since the terms ?7^"^ are independent, 

Var X2)) = ^Var (f/++(a;i, X2)). (60) 

Secondly, we will determine the variance of U^i^{xi,X2). We have 

Var(?7++(a;i,a;2)) = Ef/++(xi,a;2)'-(Ef/++(a;i,X2))^ (61) 
Let us begin with determining E f/-,^^(xi, ^2)^. Note that, if h < ^, we have 

Jxi+ii-Xki\ ,[x2 + i2-Xk2\ ,fxi+ji-Xki\ ,(xi+i2-Xk2\ „ 

" i — ^ — r \ — h — r \ — h — r I — h — ) = ° (^^^ 
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unless ii = 12 and ji = j2, where ii,i2,ji,j2 € ^- This holds because if ii 7^ ^2 or ji 7^ j2, then 
at least two pairs of arguments in the product (!62|) are more than distance two apart, rendering 
the product equal to zero. Thus in the following equation, as /;.—)■ 0, only the square products 
do not vanish and we can write 



i=i j=i 



SsEEe(»( 1 — 



Now we use the substitutions V\ := ui — i and f 2 '■= U2 — j to obtain 
Ef/iV(a;i,X2)^ = -^^^y J {^'{ ^^^l — —^^ giui,U2)duidu2 

2 — 1 j — 1 

Note that the integrand is nonnegative, thus interchanging sums and integrals is allowed (Fu- 
bini), so 



w — - — j U7 — j F^^{vi,V2)dvidv2. 



—00 ^ —00 



Now apply the substitutions zi = (xi — vi)/h and 2:2 = (x2 — ^2)//?- and recall the bounded 
support of w'. Furthermore, because limh^o F~^~^ [xi — hzi,X2 — hz2) = F~^~^{xi,X2) < 1, we 
can again apply the Lebesgue Dominated Convergence Theorem 





{^2. 
















1: 


w'{ 


'zfd^ 
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Now note that EC/++(xi,X2) = E/++(xi, X2) = f{xuX2) + 0{h^). So 



Var(/++(^i,:r2)) = -Var(C/++(xi,X2)) 



^F++{x,,X2)(^j'w'{zfdz 
^F++{xi,X2)(^J w'izfdz^ +o{n-^h 



+ 0{h-'')-f{xi,X2f-0{e) 



)• 



We can follow a similar procedure to obtain the variances of the other estimators. To summarize 
we get 



Var(/~^"(xi,X2)) 
Var(/^^+(xi,a;2)) 
Var (^1' ^2)) 
Var(/++(a;i,a;2)) 



n/i6 
1 

1 



i^F-+(xi, X2) ( J'^ w'izfdz) ' + o{n-'h-'), 
F+-{xi,X2)\ 



1 \2 

1 ^ 



-^F++{xi,X2)(^J w'{zfdzy + o{n-^h- 



Now let us determine the variance of combinations of these estimators. We have 

Var Unhi.xi,X2)) = Var itif-^{xi,X2) + t2f~^{xi,X2) + hf,^^;{xi,X2) + Uf^^{xi,X2)) 
=^?Var (/-.-(xi, 0:2)) + t^Var {f-+{x,, X2)) + t^Var X2)) + i^Var X2)) 

+ 2tit2Cov (/,7,rC^i= '^2), /n?.^(a:i, 2:2)) + 2M3C0V {f~j;{xi, X2), f^hi^i^ ^2)) 
+ 2tit4Cov (/„V(a^i, 2:2), 0:2)) + 2t2t3Cov {f-^{xi, X2), /„V(a;i, a;2)) 

+ 2t2t4Cov (/-+(a;i, X2), X2)) + 2M4C0V ^2), X2)). 

Let us look at Gov {f^hixi, X2), fnt^i^i, 2^2))- In similar fashion as we determined the variance, 
we find 

Gov if'hixi, X2), f'hixi, X2)) = ^Cov {U-f^~{xi, X2), U-+{xi, X2)) 

= ^ [E U-f^~{xi, X2)U~+{xi, X2) - E U-f;-{xi, X2)E C/-+(xi, ^2)] 

Let us first determine EUif^ {xi, X2)Uif^ {xi, X2) ■ Note that, if /i < |, we have 



w 



I ( Xi — ii — Xki \ , ( X2 — 12 — Xk2 \ I ( Xi — ii — Xki \ f f Xi + j2 — X, 



h 



w 



h 



w 



h 



w 



^k2 



h 



= 0, (63) 
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for all 11,12, ii and j2- This holds because the second and fourth argument in the product ( 163|) 
are always more than distance two apart, rendering the product equal to zero. Thus 

E U-^ (xi , X2)U-+ (xi , X2) = 0. (64) 

Secondly, because we have already determined E 0:2) and E [/^^'''(xi, X2) earlier, we 

know that 

Ef/--(xi,X2)Ef/-+(xi,X2) = f{xuX2f + 0{h^). (65) 

Thus 

Cov(/-,-(xi,X2),/-+(xi,X2)) = ^[-/(xi,X2)' - 0{h')] = o{n-'h'). (66) 
This result holds for all the covariances. So we arrive at 



Var Unh{xi, X2)) =(tfF— (xi, X2) + t^F-+(xi, X2) + tiF+^^x^ X2) + t4^++(xi, X2)) 

w'izfdz] +o{n-^h-^) 



nh^ \ J_i 

--B{xi,X2,ti,t2,t3,U)--^(^j w'{zfdz^ +o{rr'^h~ 



This proves statement (I2T1) of the theorem. □ 
6.3 Proof of Theorem ICT 

The convex combination of the four density estimators is given by 

fnli^U^^) = tlf;;^;{Xi,X2) +t2f-f^{Xi,X2) +hf;li^{xi,X2) +hf^^{Xi,X2), (67) 

where ti + ^2 + ^3 + ^4 = 1- Now define 

SlnhiXi,X2) = fnhi^l^^^) " fnh{Xl,X2), 
S2nh{Xl,X2) = -fnh {Xi,X2) + /^/.^ (^1 , X2) , 
S3nh{Xi,X2) = f;^hi^l^^2) - fnh{Xl,X2), 
^4nh(a;i,X2) = -f;thi^l^^2) + fnh{Xi,X2). 



We can rewrite fl67j) as 

/nh(^l'^2) = f~h{Xl,X2) - (ts + ^4)5'l„/i(Xi,X2) - t25'3„/, (Xi , X2) + tiSinhixi, X2) , (68) 



Lemma 6.1 Under the conditions of Theorem 4-1 we have, for i = 1, . . . , 4, 



ESinh{Xi,X2) = 0, 
ESinh{Xi,X2f = 

ES'i„ft(xi,X2)^ = 



(69) 
(70) 

(71) 
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Proof 

We give the proof for Sinhixi,X2)- The other claims can be proved similarly. 
Note that 

Slnh{Xl,X2) = fnh {xi,X2) - fnh {Xl,X2) 
k=l i=0 j=0 

w )w ' 



^ n oo oo 

EEE 



nh^ ^ ^ ^ h h 

k=l i=l j=0 



_ n oo oo . . 

^.1^ L h V( H )■ 

k=l i=—oo j=0 



Define 

oo oo 



rr f ^ ^ ,fXi+i-Xkl\ ,(X2+j-Xk2\ 

uMxuX2) -Y^l^ 2^^ y 1 y 1 )■ (^2) 



i=— oo j=l 

1 V^n 



Then Sinhixi, X2) = ^ Ylk=i Uikhixi, X2) and the terms in the sum are independent. 
Following similar steps as in the proof of Theorem 13.11 we get 



-00 ^-00 i ^ i=_oo j=l 

We also have, as in the same proof, 

'ESinh{xi,X2f = Vai{Sinh{xi,X2)) = - Vai{Uuh{xi, X2)) = o(^). 

Finally we consider the fourth moment of Sinh{xi,X2) = ^J2k=iUikh{xi,X2). By indepen- 
dence of the terms we have 

1 . ... 3(n-ll 



E Sinh{xi,X2y = -T E Uiihixi, xa)^ H ^ — f E Uuh{xi,X2f) 

n-^ ri'^ \ / 



This completes the proof of the lemma. □ 
From f l68|) we get, omitting the arguments (xi,X2), 

fnh^ ~ fnh ~ ~{'tn3 ~ ^3)'S'ln/i — (tn4 — ii)Sinh — {in2 — i2)S^nh + (^n4 — i4)Si^nh- (73) 

Hence, under the assumptions of the theorem and by the Cauchy Schwarz inequality, we have 
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^ \ fnh ~ fnh I — ^ l'^"3 ~ ^3 1 1 'S'lnfe 1 1 + E |t„4 — ^4] ISinftl + E \tn2 — ^2] |'S'3„/i| + E |t„4 — t4| |S'4„/i 
/ _ \ 1/2 / \ 1/2 / _ \ 1/2 / \ 1/2 

< (e (t„3 - h)') (e + (e (t„4 - u)') (e si,) 

/ _ \ 1/2 / \ 1/2 / _ \ 1/2 / n1/2 

+ (E(U-hf-) (eS|„j) + (e (i„4 - «4)2) (ESil) 

Similarly we have, since {yi + y2 + Vs + Vif < ^{vl + yl + yl + vl), 

Var(/i*;^-/S)<E(/it^-/i2f 

< 4E {tns - hYSlf^ + 4E (t„4 - uYSlf^ + 4E (t„2 - hYS^f^ + 4E (t„4 - t4)^>S'4„;, 

/ \ 1/2 / \ 1/2 / \ 1/2 / \ 1/2 

<4(E(t„3-t3r) (EsQ +A(EiL,-Ur) (ESQ 

/ _ \ 1/2 / N 1/2 / _ X 1/2 / X 1/2 

+ 4(E(t„2-t2r) [EsQ +4(E(t„4-t4r) (e54^„,) 




Since the two bounds above are negligible compared to the order of the bias and variance in 
Theorem 13.11 it follows that this theorem also holds for the estimator with estimated weights. 

In order to prove asymptotic normality note that by Lemma 16.11 and condition (!32|) it 
follows that y/nh^ times each of the terms in the representation (!73|) vanish in probability. Also 
it follows that y/nh? times the expectation of (1731) vanishes asymptotically. Hence the limit 

distributions of of \/nh?{f'^jf' — E f^h^) and \/nh^{fj^ — E fj^) coincide. The limit distribution 
of the latter follows by checking the Lyapounov condition for asymptotic normality. 

□ 



6.4 Proof of Theorem 11:21 

First we will expand the expected value for F~f^ . We will skip the proofs for the remaining 
three two-dimensional estimators, since these can be done in precisely the same manner. We 
have 

EF-,-{xi,X2) = 

f 1 "A-^-^ fxi-i-Xki\ fx2-j-Xk2\\ 

=E(^;^5:2:2:-^ — i — — i — )) = 

^ k=l i=0 j=0 ^ ^ ^ ^ ^ 

1^^ fxi-i-Xu\ fx2-j-Xi2\ 

= /^eee-^ — ^ — — ^ — = 

^ OO OO /'OO f*OQ 

j=0 1=0 •^^'^ 
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By substituting Vi := Ui + i and V2 := U2 + j and interchanging of integrals and sums we get 

■ n • n t/ —00 t/ —00 



Interchanging integrals and sums is allowed because the integrand is a nonnegative bounded 
function. 

Further, since Yl^o X]^o ~ ^''^2— j) = (t>i, t>2), we can continue with 



f 00 POO 



EF^^ {xi,X2) = ^ j j ^(~^7^)^(~^~/^)^ {vi,V2)dvidv2. (76) 
Next we apply the substitutions zi := (xi — vi)/h and ^2 := (2^2 — V2)/h to get 

/OO /"OO 
/ w;(2;i)w;(z2)-^ {xi - hzi,X2 - hz2)dzidz2. (77) 

The multivariate version of Taylor's theorem allows us to expand F (xi — hzi,X2 — hz2) as 

F (Xi - hZi,X2 - hZ2) = F (Xi,X2) - h{ziF{~ + Z2F2~){Xi,X2) + 

+ hi\zlF{f + ziZ2{F{^- + F2-r) + 2;2'F2-2-)(xi,X2) + o{e), (78) 

where = ^ ^ gi^^'^^'' ^^"^ -^12^ = ^8x18x2''^^'' ' ^^^'^ plug- in (j78ll into (jTTll and recall 

the function w is symmetric. Thus 

^Kh{^l^^2)=j j w{Zi)w{z2){F''{Xi,X2) -h{ZiF{' + Z2F~'){xi,X2) + 

+ ^/^^(zi^Fii- + ziZ2{F{^'' + Fs^r) + 2:2^2^" )(a;i, 2:2) + o{h'^)}dzidz2 = 
=F (xi,X2) - /i-Fr~(xi,X2) J J Ziw{zi)w{z2)dzidz2- 

- hF2~{xi,X2) J J Z2w{zi)w{z2)dzidz2+ 

+ ]4i^F{f{xi,X2) I I zlw{zi)w{z2)dzidz2+ 

1 

+ 2 



1 J -I 

1 /.I 



^h'^{F^2 +-^21 )(a;i,a;2) / / ziZ2w{zi)w{z2)dzidz2+ 



-1 ^-1 



^^^-^22 (a;i,a;2) y y zlw{zi)w{z2)dzidz2 + o{h^) = 

1 /"^ 

F— (xi,X2) + -/i'(Ffi- + F22")(xi,X2) y ;z2w(z)dz + o(/i2). (79) 
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Because w is supported only on [—1, 1] x [—1,1] it is not necessary to integrate over all R and 
we can change the domain of integration. 

By following the same arguments for the other three estimators we obtain similar expansions 
for the expected values. 

Let us continue with the proof of the variance expansion. Define 



TT—r \ 1 ( Xi - i-Xkl \ f x2-j- Xk2 \ .„„s 

U^h i.xuX2):=-^}^}_^w\ \w[ ). (80) 



i=0 j=0 

1 v^n 



Then F^^ {xi,X2) — ^Ylk=i^kh (^ii^^)- Since all C/^^ are independent, we have 
Var(F-;,- (2^1,2:2)) = ivar([/--(xi,X2)) = 

/ 1 



= i (e {U-,- {xr,x,)y-{E U-,~ {x, , X,)) ') . (81) 
First we determine E (C/j^~(a:i, X2))^. Note that, if /i < |, we have 



fxi-ii-Xki\ fx2-i2-Xk2\ fxi-ji-Xki\ fxi-j2-Xk2\ „ 

— h — r[ — h — r\ — h — r\ — i — J ^° ^^'^ 

unless ii = 12 and ji = j2, where ii, 12,31,32 £ ^- This holds because for any ii 7^ 12 or ji 7^ 32, 
at least one argument of w falls out of support rendering the product equal to zero. Thus in 
the following equation, as /i — )■ 0, only the square products are not equal to zero and we can 
write 

E(.,-(.,..)r=E(i,f:|:.(-^).(-^))^ 



00 'DC 



'EEE(4^^^)»(^i^^)). (83) 

i=Q j=0 



/i^^^VV h \ h 



By substituting vi :— ui + i and V2 '■— U2 and interchanging of integrals and sums, which 
is allowed because integrand is nonnegative, we get 

E(C/--(xi,X2))^ = 

= E E /_ /_ (^) - (^) ) '^(-1 - - = 

i — j — 

= loo loo ^ ) E E ^(-1 - ^' - - 



i=Q j=0 



1 /"^ 2f^l-'^i\ 2(^2-V2\^__. , 

' IF (vi, ^;2)a'^;ld■^;2■ 



h\l_^J_^ \ h ) \ h 
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Next we apply the substitutions zi := {xi — vi)/h and Z2 := (x2 — V2)/h. The fact that 
hm/j^o (2^1 — hzi,X2 — hz2) = F (xi,X2) < 1 then yields by the dominated convergence 
theorem 



1 /.I 



w'^{zi)w'^(z2)F (xi — hzi,X2 — hz2)dzidz2 



-1 J -I 

1 rl 



w'^{zi)w'^{z2)F (xi, X2)dzidz2 + o{h ^) 



1 

^F-~{x,,X2)i^j ^w\z)dz) +o{h-^). 



lJ-1 



(84) 



Because w has support only on [—1,1] x [—1,1] we are allowed to change the domain of inte- 
gration. 

For the term (EUy^ {xi,X2))'^ note that 



EU[i^-{x,,X2)=EF~-ixi,X2)=F~-{xi,X2)+0{h^) 
So the variance of F^~(xi,X2)) is given by 
Var (F;";,"(xi,X2)) = 



(85) 



1 

n 



—F-{x,,X2)(l w^iz)dz] +o{h-^)- {F-{x^,X2) + 0ih^)y 



^^-F—{xi,X2)\^j w^{z)dz] +o{n-^h-^). 



(86) 
(87) 



Likewise we may determine the other variances of the two-dimensional distribution estimators. 



□ 



6.5 Proof of lemma 14.31 

Proof Let us first introduce some notation. Define the vectors v(xi,X2) and v^^(xi,a;2) by 

V(X1,X2) = (F""(X1,X2),F"+(X1,X2),-F+~(X1,X2),-F++(X1,X2)), 

v„/,(xi, X2) = (F---(xi, X2), F--+(xi, X2), Ft~(xi, X2), Ft+(xi, X2)). 

Note that, for n large enough, the components of these vectors are all at least e„ and that they 
are at most one. 

We will only check (|27|1 and fl29|) for i equal to one. The other cases can be treated similarly. 
Then we also need the vector of partial derivatives of the the function ti (1/1,1/2, ,2/3, 1/4)- Note 
that on the line segment between v^/i(a;i, 3^2) and v(a;i,X2) all the components are all at least 
e„ and that they are at most one. This implies after some computation 

||Vtl(t/l,l/2,,Z/3,Z/4)|P < 
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for some constant B, for all points {yi,y2, ,113,114.) on this line segment. 

We can now apply the multivariate mean value theorem and the Cauchy Schwarz inequality 
to get 

(t„i(xi,X2) -ii{xi,X2)f = (ti(v„^(a;i,a;2)) - ti(v(xi, X2)))^ 
= (v„ft(a;i,a;2) - v(xi,X2)) ■ Vti(?/i,?/2, ,l/3,Z/4))^ 

< II X2) - v(xi, X2) f II Vti(?/i, 1/2, , 2/3, y4))f 

< ^ l|v„^(a;i,a;2) - v(xi,X2)f , 

where (l/i, 1/2, , I/3, 1/4) is a point on the line segment between v(xi,X2)„^ and v(xi,X2). Note 
that ||v„j^(xi, 0:2) — v(a;i, 0:2) ini is a sum of four terms like X2) — -F (xi, X2))^, which 

is smaller than (F^~(a;i, 0:2) — F (a;i,X2))^, and that E (F^~(a;i, X2) — -F (3^1,3^2))^ equals 
the variance plus the squared bias of F^~(xi,X2). By Theorem 14.21 we can bound these to get 

E{Ll{xuX2)~h{x^,X2)f < ^(o(^)+OCh')) = 0{n-'/'{\ognf), 

for a bandwidth h of order n~^l^ . This implies that fl7r|) is satisfied. 

Let us now check that ( l29l) is satisfied. By an argument similar to the one above it suffices 
to check if terms like E(F^~(xi,X2) — F (a;i,a;2))^ vanish asymptotically. Write 

^nT(^i'^2) -i^~~(a;i,a;2) = F---(a;i,X2) - EF--~(a;i,a;2) + EF---(a;i,X2) - F--(xi,a;2). 
By the triangle inequality we have 

1/4 



1/4 ^ _ N^l/4 

'xi,a;2) 



(E(Fy(xi,a;2)-F--(a;i,X2)r) < 

< (E(F;--(xi,a;2)-EF---(xi,X2)r)''V (EF---(a;i,X2)-F 

So, by (a + 6)"^ < 8(a^ + 6^), a, 6 > 0, we also have 

E(F---(xi,a;2)-F-(xi,X2))'< 

<8E(F;--(xi,X2)-EFY(xi,X2))' + 8(EF---(a;i,X2)-F--(xi,a;2 

Since the bias vanishes by Theorem 14.21 it suffices to prove the bound of the lemma for the 
fourth power of the error. 

Recall from the proof of Theorem 14.21 that F~^{x\^X2) = ^ Y12=i ^fc/i (^i? ^2), where 

r,— / N 1 fxi-i-Xki\ fx2-j-Xk2\ 

- pggH — I — jH — I — y 

Note that the U~~~ are independent. Now write 

1 " - 

^nT(^i'^2) -EFY(xi,X2) = -$^t>fcT(a;i,a;2), 



n 

k=l 
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where f/^- (xi,X2) = U^j^ (xi,X2) — Et/^- (xi,X2). Since EUf^i^ {xi,X2) equals zero we have 



1=1 



Similar to the derivation of ( 184|) we get 



and 

Under the condition on h in the lemma both terms vanish. This shows that fl2^ is satisfied as 
well. Condition fl32|) follows from condition ( 129|) by the Cauchy-Schwarz inequality. □ 

6.6 An inequality 

The next lemma can be used to derive the weights that minimize the asymptotic variance of 
the convex combination of the original for estimators of the density /. 

Lemma 6.2 Let ai, . . . , be m positive numbers. Then for all positive ti, . . . ,tm with ti + 
. . . + = 1 we have 

ait^ + . . . + amtm > 



where Sm{ai, ■ ■ ■ , am) is defined by 

m— 1 

•Sm(ai, . . . , flm) = ^203 • • • flm + fll . . . aj-ia^+i . . . ttm + ^1^2 • • • ^m-l, (89) 

i=2 

t/ie sitm of the m products of length m — 1 obtained by skipping one term in the full product. 

The minimum is attained at the t vector given by ti = 0203 ■ ■ ■ am/sm{(ii, ■ ■ ■ , Om) <iiT'd tm = 
aia2 ■ ■ ■ am-i/smiai, . . . , am) and 

aia2 ■ ■ ■ tti-itti^i ■ ■ ■ ttm . „ 1 
ti = , i = 2,...,m-l. 

Sm yOji , . . . , ttm ) 

Proof Introduce the inner product < ■,■ >a and corresponding norm || • \\a by 

< X,y >a = 0203 ■■■am Xiyi + OiOg ...am X2Z/2 + • • • + aia2 . . . am-l XmVm, (90) 

1/2 



\x 



\a = (0203 ...amxl + aia^, . . .amxl + . . . + aia2 ■ ■ ■ a^-i a:^ j • (91) 
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Then, with 1 equal to the vector of m ones, the Cauchy-Schwarz inequahty imphes 



aia2 .. .am= (aia2 • • • am)(ti + ^2 + • • • + ^m) 

= < 1, (aiti, 02^2, • • • , O-mtm) >a < ||l||a|| (^itl, 02^2, • • • , 0.m.tm)\\a 




1/2 



= \/s{ai, . . .,am){ {aia2 . . . am){aitl + 02^2 + • • • + amtm) 




which imphes the inequahty after some rewriting. 



□ 
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